Structural Equation Modelling in R

Author

Martin Schweinberger

Published

January 1, 2026

Introduction

This tutorial introduces Structural Equation Modelling (SEM) — a powerful and flexible family of multivariate statistical techniques that allows researchers to simultaneously model multiple relationships among variables, account for measurement error, and test theories about constructs that cannot be directly observed. Where simple linear regression models a single outcome from one or more predictors, SEM can model entire systems of relationships, including situations where the same variable acts as both a predictor and an outcome, and where some of the most important variables in a theory are not directly measurable at all.

SEM is particularly well suited to the language sciences. Much of what linguists and applied linguists care about — language anxiety, motivation, metalinguistic awareness, communicative competence, reading ability — cannot be captured in a single measurement. These are latent constructs: theoretical entities that we infer indirectly from a set of observable indicators such as questionnaire items or test scores. SEM provides a principled framework for doing exactly this, and then for examining how these latent constructs relate to one another and to observable outcomes.

SEM is increasingly recognised as a valuable tool in corpus linguistics and cognitive linguistics. Larsson, Plonsky, and Hancock (2021) make the case that path models — a fundamental building block of SEM — are well suited to the multivariate nature of corpus-linguistic data, enabling researchers to move beyond monofactorial analyses and test theoretically motivated causal structures. Fuoli (2022) provides a step-by-step introduction to SEM in R for linguists working in a cognitive-linguistic framework, demonstrating its utility for modelling the psychological effects of linguistic choices. Rosseel (2012)’s lavaan package, which we use throughout this tutorial, has made full-featured SEM freely available in R.

This tutorial is aimed at beginners with no prior exposure to SEM. You do not need to have studied factor analysis or path analysis before, though familiarity with basic regression is helpful. The goal is to build conceptual understanding from the ground up and to equip you with the practical skills to fit, evaluate, and report SEM models in R.

Learning Objectives

By the end of this tutorial you will be able to:

Explain the distinction between observed and latent variables and describe why measurement error matters
Identify the two building blocks of a full SEM — the measurement model and the structural model — and describe what each specifies
Read and interpret a standard SEM path diagram
Specify a Confirmatory Factor Analysis (CFA) in lavaan model syntax
Evaluate a CFA using model fit indices (CFI, TLI, RMSEA, SRMR) and reliability coefficients (McDonald’s ω)
Extend a measurement model to a full SEM by adding structural paths
Interpret standardised path coefficients and R² values from a full SEM
Test mediation hypotheses using labelled paths and bootstrapped confidence intervals
Compare nested and non-nested SEM specifications using Δχ², AIC, and BIC
Use modification indices responsibly to diagnose model misfit
Report SEM results in accordance with current best-practice conventions in linguistics and applied linguistics

Prerequisite Tutorials

Before working through this tutorial, we recommend familiarity with the following:

Citation

Martin Schweinberger. 2026. Structural Equation Modelling in R. The Language Technology and Data Analysis Laboratory (LADAL), The University of Queensland, Australia. url: https://ladal.edu.au/tutorials/sem/sem.html (Version 2026.03.28).

Preparation and Session Set-up

Install required packages once:

Code

install.packages("lavaan")
install.packages("semPlot")
install.packages("semTools")
install.packages("psych")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("tidyr")
install.packages("flextable")
install.packages("checkdown")

Load packages for this session:

Code

library(lavaan)      # SEM and CFA estimation
library(semPlot)     # path diagram visualisation
library(semTools)    # reliability and model comparison tools
library(psych)       # descriptive statistics and correlation matrices
library(dplyr)       # data manipulation
library(ggplot2)     # data visualisation
library(tidyr)       # data reshaping
library(flextable)   # formatted tables
library(checkdown)   # interactive quiz questions

The Dataset

Throughout this tutorial we use a simulated dataset inspired by research on second-language (L2) writing. The data represent 300 university students who completed a battery of questionnaire scales and an academic writing task. The dataset includes:

Language Anxiety (anx1–anx3): three Likert-scale items measuring the degree to which students feel anxious when writing in their L2 (higher = more anxious)
Writing Self-Efficacy (eff1–eff3): three items measuring students’ confidence in their L2 writing ability (higher = greater self-efficacy)
Motivation (mot1–mot3): three items measuring students’ intrinsic motivation to improve their L2 writing (higher = more motivated)
Writing Score (writing_score): a holistic score (0–100) assigned by trained raters to an in-class academic writing task

Because the data are simulated in R, no external file is needed — you can reproduce the entire analysis from the code below.

Code

set.seed(42)
n <- 300

# Latent variable scores
anxiety  <- rnorm(n, 0, 1)
efficacy <- rnorm(n, 0, 1)
motivat  <- 0.55 * efficacy + rnorm(n, 0, sqrt(1 - 0.55^2))

# Observed indicators (loading * latent + unique error)
anx1 <- 0.78 * anxiety  + rnorm(n, 0, sqrt(1 - 0.78^2))
anx2 <- 0.72 * anxiety  + rnorm(n, 0, sqrt(1 - 0.72^2))
anx3 <- 0.80 * anxiety  + rnorm(n, 0, sqrt(1 - 0.80^2))

eff1 <- 0.80 * efficacy + rnorm(n, 0, sqrt(1 - 0.80^2))
eff2 <- 0.74 * efficacy + rnorm(n, 0, sqrt(1 - 0.74^2))
eff3 <- 0.77 * efficacy + rnorm(n, 0, sqrt(1 - 0.77^2))

mot1 <- 0.73 * motivat  + rnorm(n, 0, sqrt(1 - 0.73^2))
mot2 <- 0.76 * motivat  + rnorm(n, 0, sqrt(1 - 0.76^2))
mot3 <- 0.70 * motivat  + rnorm(n, 0, sqrt(1 - 0.70^2))

# Outcome: Writing Score
writing_score <- 55 + 10 * efficacy - 6 * anxiety + 4 * motivat + rnorm(n, 0, 6)
writing_score <- round(pmin(pmax(writing_score, 10), 100))

# Assemble data frame
semdata <- data.frame(anx1, anx2, anx3,
                      eff1, eff2, eff3,
                      mot1, mot2, mot3,
                      writing_score)

anx1	anx2	anx3	eff1	eff2	eff3	mot1	mot2	mot3	writing_score
1.658784468	0.4690270	0.7359370	-0.7364966	-0.39819829	0.843448176	-0.5256143	-0.605277180	-0.8442590	45
-0.596042145	-0.3811790	-0.5332482	0.3364021	1.18606945	0.202026840	0.2657331	0.070635168	-0.1805353	67
0.343614655	0.4858209	-0.3018609	-0.3901992	-0.01513915	0.066076858	0.5215374	0.378665185	0.4093028	53
0.222087738	0.7191464	1.0054451	0.1834295	-0.08635442	0.227155570	2.0061491	0.091068745	1.4019459	59
1.678695011	0.8993807	-0.1536211	-1.4989920	-0.53472795	-0.164400898	-1.0288670	0.244683084	-0.1740060	48
-1.934320794	0.5713374	0.1193792	-0.1359096	-0.42973643	0.057995871	1.2819090	0.396978066	0.9666851	66
1.229605361	-0.5972336	1.7314755	0.9551093	0.48355217	0.306254326	-0.2578462	0.002948307	0.3593494	58
-0.004912068	1.1310285	-0.7850238	1.4881724	1.09775461	0.538220551	1.0907636	0.018218843	0.6954823	77
1.707942087	1.7697191	2.2284754	0.2474951	-1.20542218	-1.260388261	-0.4296632	-0.298336550	0.5411556	26
-1.023768597	-0.4409694	-1.3152320	0.1323840	-0.25310831	-0.004063715	-1.6269322	-1.623040916	-2.2125952	41

Conceptual Foundations

Section Overview

What you will learn: The core ideas underpinning SEM — latent variables, measurement error, path diagrams, and the two-component structure of a full SEM.

Why it matters: SEM notation and vocabulary are quite different from ordinary regression. Building a solid conceptual foundation before fitting models prevents common misinterpretations.

Observed vs. latent variables

A fundamental distinction in SEM is between variables you can observe directly and those you cannot.

Observed (manifest) variables are things you actually measure and record: a Likert-scale item, a test score, a reaction time, a corpus frequency count. They appear as columns in your dataset.

Latent variables are theoretical constructs that you cannot measure directly. Language anxiety, motivation, and writing self-efficacy are classic examples from applied linguistics. No single questionnaire item perfectly captures any of these constructs — each item is merely a fallible indicator. Latent variables are never columns in your dataset; instead, they are modelled as common causes of their observed indicators.

This distinction matters because measurement error is unavoidable whenever we use observed items to represent theoretical constructs. If we ignore this error — for example, by averaging questionnaire items and treating the result as if it were the true construct — we introduce attenuation bias into our estimates of relationships. SEM addresses this explicitly: it partitions the variance in each observed indicator into a part explained by the underlying latent variable and a part attributed to unique error (random noise plus any systematic variance not shared with the other indicators). As Larsson, Plonsky, and Hancock (2021) note, treating latent variables such as motivation and proficiency as observed (e.g., by using composite scores) leads to underestimation of relationships, which is one of the key arguments for using SEM in language research.

The two building blocks of SEM

A full structural equation model is composed of two sub-models:

Sub-model	Technical name	What it specifies
Measurement model	Confirmatory Factor Analysis (CFA)	Which observed items are indicators of which latent variables; how strongly each item loads onto its construct; how much unique error each item has
Structural model	Path model	Directional relationships among latent variables (and between latent variables and observed outcomes); regression-like paths encoding theoretical predictions

In the standard two-step approach to SEM (Anderson and Gerbing 1988), researchers first establish an adequate measurement model (Step 1) before testing the structural paths of theoretical interest (Step 2). This tutorial follows this workflow: we build and evaluate a CFA in Section 3 and then add structural paths in Section 5.

Path diagrams

SEM models are almost always communicated visually through path diagrams. The notation is standardised:

Symbol	Represents
Rectangle	Observed (manifest) variable
Oval / ellipse	Latent variable
Single-headed arrow (→)	Directional path (a regression-type effect)
Double-headed curved arrow (↔︎)	Covariance or correlation
Small arrow into rectangle	Residual / unique error for that indicator
Small arrow into oval	Disturbance (residual error for an endogenous latent variable)

In a measurement model, ovals point to rectangles: the latent construct is hypothesised to cause variation in its observed indicators. In a structural model, ovals point to other ovals, encoding directional theoretical predictions among constructs.

SEM is a confirmatory, theory-driven method

Unlike Exploratory Factor Analysis (EFA), which discovers factor structure empirically from the data, SEM requires the researcher to specify the model in advance based on theory. Every path in the diagram — every arrow that is included or excluded — reflects a theoretical decision. A good model fit indicates that the specified model is consistent with the data; it does not prove the model is the only correct one. Alternative models that fit equally well are always possible (this is the problem of equivalent models). Always ground your SEM specifications in theory, not post-hoc data exploration (Kline 2023).

A conceptual map of our example

Our theoretical model for the L2 writing dataset can be described as follows:

Language Anxiety, Writing Self-Efficacy, and Motivation are latent constructs, each measured by three questionnaire items.
We expect Self-Efficacy and Anxiety to have opposite effects on Writing Score: greater self-efficacy should improve performance; greater anxiety should impair it.
Self-Efficacy is also expected to influence Motivation (students who feel more capable tend to be more motivated), and Motivation may in turn have a positive effect on Writing Score. This indirect path constitutes a mediation hypothesis.

This conceptual model drives all the analytic choices that follow.

Descriptive Statistics and Correlations

Section Overview

What you will learn: How to examine the observed variables before fitting any model.

Key steps: Descriptive statistics, distribution checks, inter-item correlations.

Before fitting any SEM, it is good practice to examine the distributions and inter-relationships of your observed variables. Severe non-normality or implausible correlations can signal problems that need to be addressed before modelling.

Descriptive statistics

Code

psych::describe(semdata) |>
  dplyr::select(n, mean, sd, median, skew, kurtosis, min, max) |>
  round(3) |>
  tibble::rownames_to_column("Variable") |>
  flextable() |>
  flextable::set_table_properties(width = .99, layout = "autofit") |>
  flextable::theme_zebra() |>
  flextable::fontsize(size = 11) |>
  flextable::fontsize(size = 11, part = "header") |>
  flextable::align_text_col(align = "left") |>
  flextable::set_caption(caption = "Descriptive statistics for all observed variables.") |>
  flextable::border_outer()

Variable	n	mean	sd	median	skew	kurtosis	min	max
anx1	300	-0.018	1.022	0.029	-0.073	-0.301	-2.968	2.660
anx2	300	-0.043	0.973	-0.027	-0.075	0.118	-3.326	2.450
anx3	300	0.016	0.983	0.052	-0.039	0.074	-3.246	2.736
eff1	300	-0.032	1.012	-0.057	0.143	-0.061	-2.916	3.006
eff2	300	-0.002	1.050	-0.012	0.014	-0.184	-2.972	3.887
eff3	300	0.016	0.975	-0.001	0.165	0.303	-2.995	3.133
mot1	300	-0.109	0.983	-0.114	-0.153	-0.057	-2.735	2.438
mot2	300	-0.072	0.964	-0.035	0.187	0.168	-2.470	3.216
mot3	300	-0.071	0.959	-0.065	-0.064	-0.172	-2.668	2.821
writing_score	300	54.560	14.778	54.000	0.036	-0.199	13.000	100.000

All items are centred near zero (as expected for standardised simulated data). Skewness values are within the acceptable range of [−1, +1] for all items, meaning that the normality assumption required for maximum likelihood estimation in lavaan (Rosseel 2012) is not substantially violated.

Correlation matrix

A correlation matrix helps us verify that items within the same scale correlate with each other (convergent evidence) and that items from different scales correlate less strongly (discriminant evidence).

Code

cor_mat <- cor(semdata |> dplyr::select(-writing_score)) |>
  round(2)

cor_mat |>
  as.data.frame() |>
  tibble::rownames_to_column("Variable") |>
  flextable() |>
  flextable::set_table_properties(width = .99, layout = "autofit") |>
  flextable::theme_zebra() |>
  flextable::fontsize(size = 10) |>
  flextable::fontsize(size = 10, part = "header") |>
  flextable::align_text_col(align = "center") |>
  flextable::set_caption(caption = "Pearson correlation matrix for the nine questionnaire items.") |>
  flextable::border_outer()

Variable	anx1	anx2	anx3	eff1	eff2	eff3	mot1	mot2	mot3
anx1	1.00	0.56	0.57	0.03	-0.04	0.00	0.01	-0.03	-0.01
anx2	0.56	1.00	0.56	0.03	-0.07	0.00	0.03	-0.01	-0.01
anx3	0.57	0.56	1.00	0.01	-0.04	-0.01	0.05	-0.02	0.07
eff1	0.03	0.03	0.01	1.00	0.60	0.58	0.30	0.36	0.32
eff2	-0.04	-0.07	-0.04	0.60	1.00	0.53	0.24	0.31	0.27
eff3	0.00	0.00	-0.01	0.58	0.53	1.00	0.23	0.29	0.22
mot1	0.01	0.03	0.05	0.30	0.24	0.23	1.00	0.53	0.50
mot2	-0.03	-0.01	-0.02	0.36	0.31	0.29	0.53	1.00	0.56
mot3	-0.01	-0.01	0.07	0.32	0.27	0.22	0.50	0.56	1.00

Code

cor_long <- cor_mat |>
  as.data.frame() |>
  tibble::rownames_to_column("Var1") |>
  tidyr::pivot_longer(-Var1, names_to = "Var2", values_to = "r")

ggplot(cor_long, aes(x = Var1, y = Var2, fill = r)) +
  geom_tile(color = "white") +
  geom_text(aes(label = round(r, 2)), size = 3.2) +
  scale_fill_gradient2(low = "tomato", mid = "white", high = "steelblue",
                       midpoint = 0, limits = c(-1, 1), name = "r") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        panel.grid = element_blank()) +
  labs(title = "Correlation heatmap: nine questionnaire items",
       x = "", y = "")

The heatmap confirms the expected pattern: items within each scale (e.g., anx1–anx3) correlate strongly with each other and more weakly with items from the other scales. The efficacy and motivation items show moderate cross-scale correlations, consistent with our theoretical expectation that the two constructs are related.

Confirmatory Factor Analysis (CFA)

Section Overview

What you will learn: How to specify, fit, and evaluate a measurement model using CFA in lavaan.

Key concepts: Factor loadings, model fit indices, reliability, convergent and discriminant validity.

Why CFA before SEM: The measurement model must be established before structural paths are meaningful. If your indicators do not adequately reflect the intended latent constructs, the structural estimates will be uninterpretable.

What is Confirmatory Factor Analysis?

Confirmatory Factor Analysis (CFA) is a measurement modelling technique in which the researcher specifies in advance which observed variables (indicators) are assumed to reflect which latent factors (constructs), and then tests whether this specification is consistent with the observed data. This is what distinguishes CFA from Exploratory Factor Analysis (EFA): in EFA the factor structure is discovered from the data with no prior constraints; in CFA the factor structure is specified from theory and then confirmed (or disconfirmed) empirically.

In our example, we hypothesise three latent factors:

Anxiety (ANX), indicated by anx1, anx2, anx3
Self-Efficacy (EFF), indicated by eff1, eff2, eff3
Motivation (MOT), indicated by mot1, mot2, mot3

Specifying a CFA model in `lavaan`

The lavaan package (Rosseel 2012) uses a simple, readable model syntax. The key operator for defining a measurement model is =~ which is read as “is measured by” or “is indicated by”:

LatentVariable =~ indicator1 + indicator2 + indicator3

We specify our three-factor measurement model as follows:

Code

cfa_model <- '
  # Measurement model
  ANX =~ anx1 + anx2 + anx3
  EFF =~ eff1 + eff2 + eff3
  MOT =~ mot1 + mot2 + mot3
'

lavaan model syntax at a glance

Operator	Meaning	Example
`=~`	Measured by (latent → indicator)	`ANX =~ anx1 + anx2`
`~`	Regressed on (structural path)	`MOT ~ EFF`
`~~`	Correlated with (covariance)	`ANX ~~ EFF`
`~1`	Intercept / mean	`anx1 ~ 1`

By default, lavaan fixes the first indicator loading to 1.0 to set the scale of each latent variable (the marker variable method), freely estimates the remaining loadings, freely estimates all indicator residuals, and freely estimates all latent variable covariances. You can change these defaults using arguments to cfa() or sem().

Fitting the CFA model

We fit the model using lavaan::cfa(). The default estimator is Maximum Likelihood (ML), which assumes multivariate normality of the observed variables.

Code

cfa_fit <- lavaan::cfa(cfa_model,
                       data      = semdata,
                       estimator = "ML")

summary(cfa_fit, fit.measures = TRUE, standardized = TRUE)

lavaan 0.6-21 ended normally after 29 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        21

  Number of observations                           300

Model Test User Model:
                                                      
  Test statistic                                12.806
  Degrees of freedom                                24
  P-value (Chi-square)                           0.969

Model Test Baseline Model:

  Test statistic                               856.110
  Degrees of freedom                                36
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.020

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -3379.871
  Loglikelihood unrestricted model (H1)      -3373.468
                                                      
  Akaike (AIC)                                6801.742
  Bayesian (BIC)                              6879.521
  Sample-size adjusted Bayesian (SABIC)       6812.922

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value H_0: RMSEA <= 0.050                    1.000
  P-value H_0: RMSEA >= 0.080                    0.000

Standardized Root Mean Square Residual:

  SRMR                                           0.023

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  ANX =~                                                                
    anx1              1.000                               0.770    0.754
    anx2              0.937    0.090   10.447    0.000    0.721    0.742
    anx3              0.965    0.092   10.479    0.000    0.743    0.757
  EFF =~                                                                
    eff1              1.000                               0.833    0.824
    eff2              0.923    0.082   11.253    0.000    0.768    0.733
    eff3              0.825    0.075   10.992    0.000    0.687    0.706
  MOT =~                                                                
    mot1              1.000                               0.664    0.677
    mot2              1.134    0.116    9.768    0.000    0.753    0.782
    mot3              1.031    0.108    9.571    0.000    0.685    0.716

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  ANX ~~                                                                
    EFF              -0.004    0.046   -0.095    0.924   -0.007   -0.007
    MOT               0.003    0.038    0.093    0.926    0.007    0.007
  EFF ~~                                                                
    MOT               0.291    0.049    5.901    0.000    0.526    0.526

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .anx1              0.449    0.059    7.658    0.000    0.449    0.431
   .anx2              0.424    0.053    7.990    0.000    0.424    0.449
   .anx3              0.412    0.054    7.583    0.000    0.412    0.427
   .eff1              0.328    0.054    6.081    0.000    0.328    0.321
   .eff2              0.508    0.058    8.695    0.000    0.508    0.463
   .eff3              0.475    0.051    9.277    0.000    0.475    0.502
   .mot1              0.521    0.056    9.266    0.000    0.521    0.542
   .mot2              0.359    0.054    6.701    0.000    0.359    0.388
   .mot3              0.447    0.053    8.464    0.000    0.447    0.488
    ANX               0.592    0.089    6.629    0.000    1.000    1.000
    EFF               0.693    0.092    7.551    0.000    1.000    1.000
    MOT               0.441    0.076    5.835    0.000    1.000    1.000

This output contains three major sections: model fit information, factor loadings (both unstandardised and standardised), and latent variable covariances.

Interpreting factor loadings

Factor loadings express how strongly each indicator is related to its underlying latent variable. In the standardised solution (column Std.all), a loading can be interpreted like a correlation: it represents the expected change in the standardised indicator for a one-standard-deviation increase in the latent variable. Standardised loadings above 0.50 are generally considered acceptable; loadings above 0.70 are considered strong (Hair et al. 2019).

Code

loadings_df <- lavaan::standardizedsolution(cfa_fit) |>
  dplyr::filter(op == "=~") |>
  dplyr::select(Latent = lhs, Indicator = rhs,
                Std_Loading = est.std, SE = se,
                z = z, p = pvalue) |>
  dplyr::mutate(across(where(is.numeric), ~round(.x, 3)))

loadings_df |>
  flextable() |>
  flextable::set_table_properties(width = .85, layout = "autofit") |>
  flextable::theme_zebra() |>
  flextable::fontsize(size = 11) |>
  flextable::fontsize(size = 11, part = "header") |>
  flextable::align_text_col(align = "left") |>
  flextable::set_caption(caption = "Standardised CFA factor loadings with standard errors and significance tests.") |>
  flextable::border_outer()

Latent	Indicator	Std_Loading	SE	z
ANX	anx1	0.754	0.038	19.668
ANX	anx2	0.742	0.039	19.193
ANX	anx3	0.757	0.038	19.772
EFF	eff1	0.824	0.033	24.642
EFF	eff2	0.733	0.037	19.838
EFF	eff3	0.706	0.038	18.458
MOT	mot1	0.677	0.042	16.071
MOT	mot2	0.782	0.038	20.464
MOT	mot3	0.716	0.041	17.666

All standardised loadings should exceed 0.50, confirming that each indicator is a meaningful reflection of its intended latent construct.

Model fit assessment

Fitting a CFA does not automatically produce a good model. We must evaluate how well the specified model reproduces the observed covariance structure in the data. This is done using model fit indices — statistics that summarise the discrepancy between the model-implied covariance matrix and the observed covariance matrix.

Model fit indices: what they mean and which cut-offs to use

No single fit index is sufficient. Report a combination of the following:

Index	Full name	What it measures	Acceptable	Good
χ²	Chi-square test	Overall model misfit (sensitive to N)	p > .05 (rarely achieved)	—
CFI	Comparative Fit Index	Fit relative to null model	≥ .90	≥ .95
TLI	Tucker–Lewis Index	Fit relative to null model (penalises complexity)	≥ .90	≥ .95
RMSEA	Root Mean Square Error of Approximation	Average misfit per degree of freedom	≤ .08	≤ .05
SRMR	Standardised Root Mean Square Residual	Average standardised residual	≤ .08	≤ .05

Cut-offs are from Hu and Bentler (1999). These are guidelines, not hard thresholds — model fit must always be evaluated in the context of model complexity and sample size (Kline 2023).

The χ² test is almost always significant in moderate to large samples even for well-fitting models, because it is extremely sensitive to sample size. It is therefore standard practice to rely on the incremental and approximate fit indices (CFI, TLI, RMSEA, SRMR) rather than on χ² alone (Fuoli 2022).

Code

fit_indices <- lavaan::fitMeasures(cfa_fit,
                                   c("chisq", "df", "pvalue",
                                     "cfi", "tli",
                                     "rmsea", "rmsea.ci.lower", "rmsea.ci.upper",
                                     "srmr")) |>
  round(3)

data.frame(
  Index = c("chi-square", "df", "p (chi-square)", "CFI", "TLI",
            "RMSEA", "RMSEA 90% CI lower", "RMSEA 90% CI upper", "SRMR"),
  Value = as.numeric(fit_indices),
  Threshold = c("—", "—", "> .05", ">= .95", ">= .95",
                "<= .05", "—", "—", "<= .05")
) |>
  flextable() |>
  flextable::set_table_properties(width = .70, layout = "autofit") |>
  flextable::theme_zebra() |>
  flextable::fontsize(size = 11) |>
  flextable::fontsize(size = 11, part = "header") |>
  flextable::align_text_col(align = "left") |>
  flextable::set_caption(caption = "CFA model fit indices with recommended thresholds (Hu & Bentler, 1999).") |>
  flextable::border_outer()

Index	Value	Threshold
chi-square	12.806	—
df	24.000	—
p (chi-square)	0.969	> .05
CFI	1.000	>= .95
TLI	1.020	>= .95
RMSEA	0.000	<= .05
RMSEA 90% CI lower	0.000	—
RMSEA 90% CI upper	0.000	—
SRMR	0.023	<= .05

Internal consistency reliability

Beyond model fit, we assess whether each scale is internally consistent — that is, whether the indicators of each latent variable reliably hang together. We use McDonald’s omega (ω), which is the preferred reliability coefficient for factor-based scales because, unlike Cronbach’s alpha, it does not assume equal factor loadings (McDonald 1999).

Code

rel <- semTools::reliability(cfa_fit)

data.frame(
  Scale  = c("ANX (Language Anxiety)",
             "EFF (Writing Self-Efficacy)",
             "MOT (Motivation)"),
  Omega  = round(as.numeric(rel["omega", ]), 3),
  Alpha  = round(as.numeric(rel["alpha", ]), 3)
) |>
  flextable() |>
  flextable::set_table_properties(width = .70, layout = "autofit") |>
  flextable::theme_zebra() |>
  flextable::fontsize(size = 11) |>
  flextable::fontsize(size = 11, part = "header") |>
  flextable::align_text_col(align = "left") |>
  flextable::set_caption(caption = "McDonald's omega and Cronbach's alpha for each scale.") |>
  flextable::border_outer()

Scale	Omega	Alpha
ANX (Language Anxiety)	0.795	0.795
EFF (Writing Self-Efficacy)	0.800	0.798
MOT (Motivation)	0.769	0.769

Values of ω ≥ .70 are generally considered acceptable for research purposes; ω ≥ .80 is considered good (Nunnally 1978).

Visualising the measurement model

The semPlot package produces path diagrams directly from a fitted lavaan object.

Code

semPlot::semPaths(
  cfa_fit,
  what       = "std",
  layout     = "tree",
  rotation   = 2,
  edge.label.cex = 0.85,
  sizeMan    = 7,
  sizeLat    = 10,
  color      = list(lat = "steelblue", man = "lightyellow"),
  title      = FALSE,
  style      = "lisrel"
)
title("CFA measurement model — standardised solution", cex.main = 1)

Each oval represents a latent variable; each rectangle an observed indicator. The numbers on the arrows are standardised factor loadings; the numbers on the small arrows into each rectangle are standardised residual variances (unique errors).

Exercises: CFA

Q1. In a CFA path diagram, what does a single-headed arrow from an oval to a rectangle represent?

Q2. A CFA model returns CFI = .88 and RMSEA = .09. What is the most appropriate conclusion?

Q3. What is the main difference between CFA and Exploratory Factor Analysis (EFA)?

Full Structural Equation Model

Section Overview

What you will learn: How to extend a CFA measurement model by adding directional structural paths between latent variables and outcomes.

Key concepts: Endogenous vs. exogenous variables, structural paths, disturbances, standardised path coefficients.

Once we are satisfied with the measurement model, we add the structural paths — the directional hypotheses about how the latent variables relate to each other and to the writing score outcome. Our theoretical model predicts:

Anxiety → Writing Score (negative effect: more anxious students perform worse)
Self-Efficacy → Writing Score (positive effect)
Self-Efficacy → Motivation (positive effect: more efficacious students are more motivated)
Motivation → Writing Score (positive effect)

Path (3) combined with path (4) constitutes an indirect effect of Self-Efficacy on Writing Score through Motivation — a mediation hypothesis examined in Section 6.

Specifying the full SEM

In lavaan, structural paths are specified using the ~ operator, which is read as “is regressed on”:

Outcome ~ Predictor

We combine the measurement model with the structural paths in a single model string:

Code

sem_model <- '
  # --- Measurement model ---
  ANX =~ anx1 + anx2 + anx3
  EFF =~ eff1 + eff2 + eff3
  MOT =~ mot1 + mot2 + mot3

  # --- Structural paths ---
  MOT           ~ EFF
  writing_score ~ ANX + EFF + MOT
'

Endogenous vs. exogenous variables

In SEM terminology:

Exogenous variables have no incoming arrows (they are only predictors, never outcomes). In our model, ANX and EFF are exogenous latent variables.
Endogenous variables have at least one incoming arrow (they are outcomes of at least one other variable). MOT and writing_score are endogenous.

Endogenous variables have a disturbance (residual error) term — the part of their variance not explained by the variables pointing to them. lavaan estimates disturbances automatically.

Fitting the full SEM

We fit the full SEM using lavaan::sem(). The syntax is identical to cfa() but with the full model specification:

Code

sem_fit <- lavaan::sem(sem_model,
                       data      = semdata,
                       estimator = "ML")

summary(sem_fit, fit.measures = TRUE, standardized = TRUE)

lavaan 0.6-21 ended normally after 56 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        24

  Number of observations                           300

Model Test User Model:
                                                      
  Test statistic                                16.790
  Degrees of freedom                                31
  P-value (Chi-square)                           0.982

Model Test Baseline Model:

  Test statistic                              1250.761
  Degrees of freedom                                45
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.017

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -4417.659
  Loglikelihood unrestricted model (H1)      -4409.264
                                                      
  Akaike (AIC)                                8883.317
  Bayesian (BIC)                              8972.208
  Sample-size adjusted Bayesian (SABIC)       8896.094

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value H_0: RMSEA <= 0.050                    1.000
  P-value H_0: RMSEA >= 0.080                    0.000

Standardized Root Mean Square Residual:

  SRMR                                           0.023

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  ANX =~                                                                
    anx1              1.000                               0.765    0.749
    anx2              0.942    0.084   11.160    0.000    0.721    0.742
    anx3              0.978    0.086   11.339    0.000    0.748    0.762
  EFF =~                                                                
    eff1              1.000                               0.809    0.801
    eff2              0.973    0.072   13.596    0.000    0.787    0.751
    eff3              0.859    0.067   12.790    0.000    0.695    0.714
  MOT =~                                                                
    mot1              1.000                               0.672    0.685
    mot2              1.087    0.106   10.269    0.000    0.730    0.759
    mot3              1.044    0.103   10.096    0.000    0.702    0.733

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  MOT ~                                                                 
    EFF               0.435    0.065    6.695    0.000    0.524    0.524
  writing_score ~                                                       
    ANX              -7.257    0.795   -9.126    0.000   -5.550   -0.376
    EFF              12.811    1.001   12.801    0.000   10.365    0.702
    MOT               5.220    1.060    4.926    0.000    3.507    0.238

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  ANX ~~                                                                
    EFF              -0.006    0.044   -0.145    0.885   -0.010   -0.010

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .anx1              0.457    0.054    8.528    0.000    0.457    0.439
   .anx2              0.424    0.049    8.699    0.000    0.424    0.450
   .anx3              0.404    0.049    8.228    0.000    0.404    0.420
   .eff1              0.367    0.040    9.051    0.000    0.367    0.359
   .eff2              0.478    0.048   10.006    0.000    0.478    0.435
   .eff3              0.464    0.044   10.480    0.000    0.464    0.490
   .mot1              0.511    0.054    9.499    0.000    0.511    0.531
   .mot2              0.393    0.049    7.982    0.000    0.393    0.424
   .mot3              0.423    0.049    8.586    0.000    0.423    0.462
   .writing_score    28.004    5.613    4.989    0.000   28.004    0.128
    ANX               0.585    0.086    6.836    0.000    1.000    1.000
    EFF               0.655    0.082    7.938    0.000    1.000    1.000
   .MOT               0.328    0.058    5.651    0.000    0.726    0.726

Structural path estimates

Code

sem_paths_df <- lavaan::standardizedsolution(sem_fit) |>
  dplyr::filter(op == "~") |>
  dplyr::select(Outcome = lhs, Predictor = rhs,
                Std_Estimate = est.std, SE = se,
                z = z, p = pvalue) |>
  dplyr::mutate(
    across(where(is.numeric), ~round(.x, 3)),
    Sig = dplyr::case_when(
      p < .001 ~ "***",
      p < .01  ~ "**",
      p < .05  ~ "*",
      TRUE     ~ ""
    )
  )

sem_paths_df |>
  flextable() |>
  flextable::set_table_properties(width = .90, layout = "autofit") |>
  flextable::theme_zebra() |>
  flextable::fontsize(size = 11) |>
  flextable::fontsize(size = 11, part = "header") |>
  flextable::align_text_col(align = "left") |>
  flextable::set_caption(caption = "Standardised structural path coefficients from the full SEM.") |>
  flextable::border_outer()

Outcome	Predictor	Std_Estimate	SE	z	Sig
MOT	EFF	0.524	0.058	8.989	***
writing_score	ANX	-0.376	0.038	-10.003	***
writing_score	EFF	0.702	0.041	17.174	***
writing_score	MOT	0.238	0.045	5.234	***

Standardised path coefficients can be interpreted similarly to standardised regression coefficients (β): they indicate the expected change in the outcome (in standard deviation units) for a one-standard-deviation increase in the predictor, holding all other predictors constant.

Visualising the full SEM

Code

semPlot::semPaths(
  sem_fit,
  what       = "std",
  layout     = "tree2",
  rotation   = 2,
  edge.label.cex = 0.80,
  sizeMan    = 6,
  sizeLat    = 10,
  color      = list(lat = "steelblue", man = "lightyellow"),
  title      = FALSE,
  style      = "lisrel",
  residuals  = TRUE,
  curvePivot = TRUE
)
title("Full SEM — standardised solution", cex.main = 1)

R² for endogenous variables

Code

data.frame(
  Variable = names(lavaan::inspect(sem_fit, "r2")),
  R2       = round(as.numeric(lavaan::inspect(sem_fit, "r2")), 3)
) |>
  dplyr::filter(R2 > 0) |>
  flextable() |>
  flextable::set_table_properties(width = .45, layout = "autofit") |>
  flextable::theme_zebra() |>
  flextable::fontsize(size = 11) |>
  flextable::fontsize(size = 11, part = "header") |>
  flextable::align_text_col(align = "center") |>
  flextable::set_caption(caption = "Proportion of variance explained (R2) for endogenous variables.") |>
  flextable::border_outer()

Variable	R2
anx1	0.561
anx2	0.550
anx3	0.580
eff1	0.641
eff2	0.565
eff3	0.510
mot1	0.469
mot2	0.576
mot3	0.538
writing_score	0.872
MOT	0.274

Exercises: Full SEM

Q1. In the lavaan model syntax, what does the ~ operator specify?

Q2. A standardised structural path coefficient of β = −0.42 (p < .001) from Anxiety to Writing Score means:

Mediation Analysis

Section Overview

What you will learn: How to test mediation hypotheses — indirect effects of one variable on another via a third — within an SEM framework.

Key concepts: Direct effects, indirect effects, total effects, bootstrapped confidence intervals.

What is mediation?

Mediation occurs when the effect of a predictor (X) on an outcome (Y) operates — at least in part — through an intervening variable, the mediator (M). Rather than a simple direct path X → Y, the effect is transmitted via the chain X → M → Y.

In our example, the theoretical mediation hypothesis is:

Self-Efficacy (EFF) influences Writing Score both directly and indirectly by increasing Motivation (MOT), which in turn improves Writing Score.

This decomposes the total effect of Self-Efficacy on Writing Score into a direct effect (EFF → writing_score), an indirect effect via Motivation (EFF → MOT → writing_score), and the total effect (direct + indirect).

Specifying mediation in `lavaan`

lavaan uses labels to name individual paths, which can then be combined using the := operator to define new parameters such as indirect and total effects. Labels are assigned by prefixing a path coefficient with a name followed by *:

Code

mediation_model <- '
  # --- Measurement model ---
  ANX =~ anx1 + anx2 + anx3
  EFF =~ eff1 + eff2 + eff3
  MOT =~ mot1 + mot2 + mot3

  # --- Structural paths (labelled for mediation) ---
  MOT           ~ a * EFF                    # path a: EFF -> MOT
  writing_score ~ b * MOT                    # path b: MOT -> writing_score
  writing_score ~ c * EFF + ANX              # path c: direct EFF -> writing_score

  # --- Defined parameters ---
  indirect := a * b                          # indirect effect of EFF via MOT
  total    := c + (a * b)                    # total effect of EFF on writing_score
'

Bootstrapped confidence intervals for indirect effects

Indirect effects are the product of two path coefficients (a × b). Their sampling distribution is often asymmetric and non-normal, which makes standard errors based on normality assumptions unreliable. The recommended approach is bootstrapping: repeatedly resampling from the data, re-fitting the model, and using the resulting distribution of indirect effect estimates to construct confidence intervals. If the 95% bootstrapped CI does not contain zero, the indirect effect is statistically significant (Fuoli 2022; Kline 2023).

Code

set.seed(42)
med_fit <- lavaan::sem(mediation_model,
                       data      = semdata,
                       estimator = "ML",
                       se        = "bootstrap",
                       bootstrap = 1000)

med_effects <- lavaan::parameterEstimates(med_fit, boot.ci.type = "bca.simple") |>
  dplyr::filter(label %in% c("a", "b", "c", "indirect", "total")) |>
  dplyr::select(Label = label, Estimate = est, SE = se,
                CI_lower = ci.lower, CI_upper = ci.upper, p = pvalue) |>
  dplyr::mutate(across(where(is.numeric), ~round(.x, 3)))

med_effects |>
  flextable() |>
  flextable::set_table_properties(width = .85, layout = "autofit") |>
  flextable::theme_zebra() |>
  flextable::fontsize(size = 11) |>
  flextable::fontsize(size = 11, part = "header") |>
  flextable::align_text_col(align = "left") |>
  flextable::set_caption(caption = "Direct, indirect (mediated), and total effects with bootstrapped 95% CIs (1000 resamples).") |>
  flextable::border_outer()

Label	Estimate	SE	CI_lower	CI_upper
a	0.435	0.061	0.330	0.573
b	5.220	1.136	2.970	7.599
c	12.811	1.170	10.722	15.377
indirect	2.270	0.516	1.386	3.423
total	15.080	1.072	13.180	17.404

Interpreting mediation results

To interpret mediation, we examine: (1) Path a (EFF → MOT): Is Self-Efficacy a significant predictor of Motivation? (2) Path b (MOT → writing_score): Is Motivation a significant predictor of Writing Score (controlling for other predictors)? (3) Indirect effect (a × b): Is the product significant, as indicated by a 95% CI that excludes zero? (4) Direct effect c (EFF → writing_score): Does Self-Efficacy still predict Writing Score after accounting for the mediation?

If both the indirect effect is significant and the direct effect remains significant, we have partial mediation: Motivation carries part of the effect of Self-Efficacy to Writing Score, but Self-Efficacy also has an effect above and beyond that mediated path. If the direct effect becomes non-significant while the indirect effect is significant, we have full mediation.

A note on causal language

Mediation analysis is often discussed in causal terms (“X causes Y through M”). However, causal inference from cross-sectional observational data is not straightforward. A statistically significant indirect effect demonstrates that the data are consistent with a mediation mechanism — it does not prove causation. To make stronger causal claims, researchers need longitudinal designs, experimental manipulation of the mediator, or other causal identification strategies (Kline 2023).

Exercises: Mediation

Q1. What is the indirect effect in a mediation model?

Q2. Why are bootstrapped confidence intervals preferred over standard (normal-theory) confidence intervals for indirect effects?

Model Comparison and Modification

Section Overview

What you will learn: How to compare alternative SEM specifications using formal tests and fit indices, and how to use modification indices responsibly.

Key concepts: Nested models, likelihood ratio (chi-square difference) test, AIC/BIC, modification indices.

Why compare models?

In practice, researchers often have competing theoretical models — alternative specifications that make different predictions about which paths should be present or absent. SEM provides tools for formally comparing such models. Two situations arise:

Nested models: Model A is a special case of Model B (Model A is Model B with one or more paths fixed to zero). These can be compared with a chi-square difference test (Δχ²).
Non-nested models: Neither model is a special case of the other. These are compared using information criteria (AIC, BIC): lower values indicate better fit, penalised for model complexity.

Comparing a constrained model

Suppose a reviewer argues that the direct path from Self-Efficacy to Writing Score is unnecessary and that all of Self-Efficacy’s influence on Writing Score is mediated through Motivation. We test this by fitting a constrained model with the direct EFF → writing_score path removed:

Code

constrained_model <- '
  # --- Measurement model ---
  ANX =~ anx1 + anx2 + anx3
  EFF =~ eff1 + eff2 + eff3
  MOT =~ mot1 + mot2 + mot3

  # --- Structural paths (direct EFF -> writing_score path removed) ---
  MOT           ~ EFF
  writing_score ~ ANX + MOT
'

constrained_fit <- lavaan::sem(constrained_model,
                               data      = semdata,
                               estimator = "ML")

lavaan::lavTestLRT(constrained_fit, sem_fit)


Chi-Squared Difference Test

                Df    AIC    BIC  Chisq Chisq diff   RMSEA Df diff Pr(>Chisq)
sem_fit         31 8883.3 8972.2  16.79                                      
constrained_fit 32 9003.1 9088.3 138.62     121.83 0.63463       1  < 2.2e-16
                   
sem_fit            
constrained_fit ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A significant Δχ² (p < .05) indicates that the constrained model fits significantly worse — that is, removing the direct path causes a significant deterioration in fit, providing evidence that the direct path contributes meaningfully and should be retained.

Code

data.frame(
  Model = c("Full model (with direct EFF path)",
            "Constrained model (no direct EFF path)"),
  AIC   = round(c(AIC(sem_fit), AIC(constrained_fit)), 1),
  BIC   = round(c(BIC(sem_fit), BIC(constrained_fit)), 1)
) |>
  flextable() |>
  flextable::set_table_properties(width = .80, layout = "autofit") |>
  flextable::theme_zebra() |>
  flextable::fontsize(size = 11) |>
  flextable::fontsize(size = 11, part = "header") |>
  flextable::align_text_col(align = "left") |>
  flextable::set_caption(caption = "Model comparison: AIC and BIC for the full and constrained models.") |>
  flextable::border_outer()

Model	AIC	BIC
Full model (with direct EFF path)	8,883.3	8,972.2
Constrained model (no direct EFF path)	9,003.1	9,088.3

The preferred model has the lower AIC (and lower BIC). A difference of more than 10 in BIC is generally considered strong evidence in favour of the model with the lower value.

Modification indices

If a model fits poorly, modification indices (MIs) can help diagnose which additional paths or covariances would most improve fit. Each MI indicates how much the overall model χ² would decrease if a particular currently-fixed parameter were freed.

Code

mi <- lavaan::modindices(sem_fit, sort. = TRUE, maximum.number = 10)

mi |>
  dplyr::select(lhs, op, rhs, mi, epc) |>
  dplyr::mutate(across(c(mi, epc), ~round(.x, 3))) |>
  dplyr::rename(LHS = lhs, Operator = op, RHS = rhs,
                MI = mi, `Expected Parameter Change` = epc) |>
  flextable() |>
  flextable::set_table_properties(width = .85, layout = "autofit") |>
  flextable::theme_zebra() |>
  flextable::fontsize(size = 11) |>
  flextable::fontsize(size = 11, part = "header") |>
  flextable::align_text_col(align = "left") |>
  flextable::set_caption(caption = "Top 10 modification indices (sorted by MI, descending). MI > 10 typically warrants attention.") |>
  flextable::border_outer()

LHS	Operator	RHS	MI	Expected Parameter Change
anx3	~~	mot3	5.008	0.071
ANX	=~	eff2	2.950	-0.119
mot2	~~	writing_score	2.363	-0.616
ANX	=~	eff1	2.327	0.101
eff3	~~	mot3	2.272	-0.047
anx3	~~	mot2	1.681	-0.041
MOT	=~	eff1	1.483	0.125
mot3	~~	writing_score	1.231	0.435
anx2	~~	eff2	1.213	-0.036
eff1	~~	writing_score	1.028	-0.538

Using modification indices responsibly

Modification indices are a double-edged sword. They are useful for diagnosing systematic misfit (e.g., correlated residuals between items that share method variance). However, acting on every high MI and re-fitting the model is a form of capitalising on chance: the revised model will fit the current sample better but may not generalise.

Rules of thumb for responsible use (Jackson, Gillaspy, and Purc-Stephenson 2009):

Theory first: only free a parameter if there is a substantive, theoretically defensible reason to do so.
One at a time: modify one parameter, re-fit, re-inspect — do not free multiple parameters simultaneously.
Cross-validate: if sample size permits, split the data and use one half to explore modifications and the other to confirm them.
Report transparently: if modifications were made post-hoc, report this explicitly and distinguish the revised model from the originally hypothesised model.

Exercises: Model Comparison

Q1. What does a significant chi-square difference test (Δχ²) between two nested models indicate?

Q2. A modification index of 24.5 suggests adding a cross-loading of anx2 onto the EFF factor. Should you add this path?

Reporting Standards

Section Overview

What you will learn: What to report in an SEM study, model reporting paragraph templates, a workflow summary table, and a reporting checklist.

Reporting SEM results clearly and completely is as important as the analysis itself.

General principles

What to report in an SEM study

Following current best practice (Kline 2023; Jackson, Gillaspy, and Purc-Stephenson 2009; Larsson, Plonsky, and Hancock 2021):

Model specification

The full theoretical rationale for the hypothesised model
Which variables are latent vs. observed; which indicators load onto which factors
Software and estimator used (e.g., “Models were estimated in R using the lavaan package (Rosseel 2012) with Maximum Likelihood estimation”)

Measurement model (CFA)

Standardised factor loadings for all indicators (with SEs and significance)
All model fit indices: χ²(df), CFI, TLI, RMSEA (with 90% CI), SRMR
Scale reliabilities (McDonald’s ω or Cronbach’s α)

Structural model

Standardised path coefficients (with SEs and significance)
R² for all endogenous variables
Model fit indices

Mediation (if applicable)

Labelled paths (a, b, c/c’), indirect effect, total effect
Bootstrapped confidence intervals (state number of resamples)
Whether partial or full mediation was found

Model comparisons (if applicable)

Δχ², Δdf, p-value for nested comparisons
AIC/BIC for non-nested comparisons

Model reporting paragraphs

CFA

A three-factor measurement model was specified a priori based on the theoretical framework, with Language Anxiety (ANX), Writing Self-Efficacy (EFF), and Motivation (MOT) each indicated by three Likert-scale items (nine indicators in total). The model was estimated using Maximum Likelihood in R (lavaan; Rosseel (2012)). Model fit was excellent: χ²(df) = X.XX, CFI = .97, TLI = .96, RMSEA = .04 [90% CI: .01, .07], SRMR = .04. All standardised factor loadings were significant and exceeded 0.70 (range: .71–.82), and McDonald’s ω exceeded .80 for all three scales, indicating good reliability. The measurement model was retained for subsequent structural analysis.

Full SEM

The structural model specified directional effects of Language Anxiety and Writing Self-Efficacy on Writing Score, and an effect of Self-Efficacy on Motivation. Model fit was acceptable: χ²(df) = X.XX, CFI = .96, TLI = .95, RMSEA = .05 [90% CI: .02, .07], SRMR = .05. Writing Self-Efficacy was a significant positive predictor of both Motivation (β = .55, SE = .07, p < .001) and Writing Score (β = .47, SE = .08, p < .001). Language Anxiety was a significant negative predictor of Writing Score (β = −.38, SE = .07, p < .001). Motivation significantly predicted Writing Score (β = .21, SE = .07, p = .003). Together, the predictors explained 58% of the variance in Writing Score.

Mediation

To test whether the effect of Writing Self-Efficacy on Writing Score was partially mediated by Motivation, we re-estimated the model with labelled paths and requested 1000 bootstrap resamples for inference on the indirect effect (Fuoli 2022). The indirect effect of Self-Efficacy on Writing Score via Motivation was significant (unstandardised b = X.XX, 95% BCa CI [X.XX, X.XX]), indicating that part of the positive effect of self-efficacy on writing performance operates through increased motivation. The direct effect of Self-Efficacy on Writing Score remained significant after accounting for this indirect path, supporting partial mediation.

Quick reference: SEM workflow

Step	Action	Key R function(s)
1. Theoretical specification	Draw path diagram; specify which indicators load onto which factors and which structural paths are hypothesised	—
2. Descriptive checks	Examine distributions (skewness, kurtosis), correlations; check for multivariate outliers	psych::describe(); cor()
3. Confirmatory Factor Analysis	Fit measurement model with lavaan::cfa()	lavaan::cfa()
4. Evaluate measurement fit	Inspect CFI, TLI, RMSEA, SRMR against recommended thresholds	lavaan::fitMeasures()
5. Assess reliability	Compute McDonald's omega with semTools::reliability()	semTools::reliability()
6. Full SEM	Add structural paths; fit with lavaan::sem()	lavaan::sem()
7. Mediation (if applicable)	Label paths; define indirect/total effects with ':='; use se = 'bootstrap'	lavaan::sem(se = 'bootstrap')
8. Model comparison	Use lavTestLRT() for nested models; AIC/BIC for non-nested; consult MIs with theory	lavaan::lavTestLRT(); AIC(); modindices()
9. Report	Report all fit indices, standardised loadings, path coefficients, R2, and effect CIs	lavaan::standardizedsolution(); parameterEstimates()

Citation & Session Info

Citation

@manual{martinschweinberger2026structural,
  author       = {Martin Schweinberger},
  title        = {Structural Equation Modelling in R},
  year         = {2026},
  note         = {https://ladal.edu.au/tutorials/sem/sem.html},
  organization = {The Language Technology and Data Analysis Laboratory (LADAL), The University of Queensland, Australia},
  edition      = {2026.03.28}
  doi      = {}
}

Code

sessionInfo()

R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26200)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Australia/Brisbane
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] psych_2.4.12     semTools_0.5-8   semPlot_1.1.7    lavaan_0.6-21   
[5] flextable_0.9.11 tidyr_1.3.2      ggplot2_4.0.2    dplyr_1.2.0     
[9] checkdown_0.0.13

loaded via a namespace (and not attached):
  [1] Rdpack_2.6.2            mnormt_2.1.1            pbapply_1.7-2          
  [4] gridExtra_2.3           sandwich_3.1-1          fdrtool_1.2.18         
  [7] rlang_1.1.7             magrittr_2.0.3          multcomp_1.4-28        
 [10] rockchalk_1.8.157       compiler_4.4.2          png_0.1-8              
 [13] systemfonts_1.3.1       vctrs_0.7.1             reshape2_1.4.4         
 [16] OpenMx_2.21.13          quadprog_1.5-8          stringr_1.5.1          
 [19] pkgconfig_2.0.3         fastmap_1.2.0           arm_1.14-4             
 [22] backports_1.5.0         labeling_0.4.3          pbivnorm_0.6.0         
 [25] rmarkdown_2.30          markdown_2.0            nloptr_2.1.1           
 [28] ragg_1.3.3              purrr_1.0.4             xfun_0.56              
 [31] litedown_0.9            kutils_1.73             jsonlite_1.9.0         
 [34] uuid_1.2-1              jpeg_0.1-11             parallel_4.4.2         
 [37] cluster_2.1.6           R6_2.6.1                stringi_1.8.4          
 [40] RColorBrewer_1.1-3      boot_1.3-31             rpart_4.1.23           
 [43] estimability_1.5.1      Rcpp_1.1.1              knitr_1.51             
 [46] zoo_1.8-13              base64enc_0.1-6         Matrix_1.7-2           
 [49] splines_4.4.2           nnet_7.3-19             igraph_2.1.4           
 [52] tidyselect_1.2.1        rstudioapi_0.17.1       abind_1.4-8            
 [55] yaml_2.3.10             codetools_0.2-20        qgraph_1.9.8           
 [58] lattice_0.22-6          tibble_3.2.1            plyr_1.8.9             
 [61] withr_3.0.2             S7_0.2.1                askpass_1.2.1          
 [64] coda_0.19-4.1           evaluate_1.0.3          foreign_0.8-87         
 [67] survival_3.7-0          RcppParallel_5.1.10     zip_2.3.2              
 [70] xml2_1.3.6              pillar_1.10.1           carData_3.0-5          
 [73] checkmate_2.3.2         renv_1.1.7              stats4_4.4.2           
 [76] reformulas_0.4.0        generics_0.1.3          commonmark_2.0.0       
 [79] scales_1.4.0            minqa_1.2.8             gtools_3.9.5           
 [82] xtable_1.8-4            glue_1.8.0              gdtools_0.5.0          
 [85] mi_1.2                  emmeans_1.10.7          Hmisc_5.2-2            
 [88] tools_4.4.2             data.table_1.17.0       lme4_1.1-36            
 [91] openxlsx_4.2.8          mvtnorm_1.3-3           XML_3.99-0.18          
 [94] grid_4.4.2              sem_3.1-16              rbibutils_2.3          
 [97] colorspace_2.1-1        nlme_3.1-166            patchwork_1.3.0        
[100] htmlTable_2.4.3         Formula_1.2-5           cli_3.6.4              
[103] textshaping_1.0.0       officer_0.7.3           fontBitstreamVera_0.1.1
[106] glasso_1.11             corpcor_1.6.10          gtable_0.3.6           
[109] digest_0.6.39           fontquiver_0.2.1        TH.data_1.1-3          
[112] htmlwidgets_1.6.4       farver_2.1.2            htmltools_0.5.9        
[115] lifecycle_1.0.5         lisrelToR_0.3           fontLiberation_0.1.0   
[118] openssl_2.3.2           MASS_7.3-61

AI Transparency Statement

This tutorial was re-developed with the assistance of Claude (claude.ai), a large language model created by Anthropic. Claude was used to help revise the tutorial text, structure the instructional content, generate the R code examples, and write the checkdown quiz questions and feedback strings. All content was reviewed, edited, and approved by the author (Martin Schweinberger), who takes full responsibility for the accuracy and pedagogical appropriateness of the material. The use of AI assistance is disclosed here in the interest of transparency and in accordance with emerging best practices for AI-assisted academic content creation.

Back to HOME

References

Anderson, James C., and David W. Gerbing. 1988. “Structural Equation Modeling in Practice: A Review and Recommended Two-Step Approach.” Psychological Bulletin 103 (3): 411–23.

Fuoli, Matteo. 2022. “Structural Equation Modeling in r: A Practical Introduction for Linguists.” Data Analytics Cogn. Linguistics: Methods Insights 41.

Hair, Joseph F., William C. Black, Barry J. Babin, and Rolph E. Anderson. 2019. Multivariate Data Analysis. 8th ed. Andover: Cengage.

Hu, Li-tze, and Peter M. Bentler. 1999. “Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria Versus New Alternatives.” Structural Equation Modeling: A Multidisciplinary Journal 6 (1): 1–55.

Jackson, Dennis L., J. Arthur Gillaspy, and Rebecca Purc-Stephenson. 2009. “Reporting Practices in Confirmatory Factor Analysis: An Overview and Some Recommendations.” Psychological Methods 14 (1): 6–23.

Kline, Rex B. 2023. Principles and Practice of Structural Equation Modeling. 5th ed. New York: Guilford Press.

Larsson, Tove, Luke Plonsky, and Gregory R Hancock. 2021. “On the Benefits of Structural Equation Modeling for Corpus Linguists.” Corpus Linguistics and Linguistic Theory 17 (3): 683–714.

McDonald, Roderick P. 1999. Test Theory: A Unified Treatment. Mahwah, NJ: Lawrence Erlbaum Associates.

Nunnally, Jum C. 1978. Psychometric Theory. 2nd ed. New York: McGraw-Hill.

Rosseel, Yves. 2012. “Lavaan: An r Package for Structural Equation Modeling.” Journal of Statistical Software 48: 1–36.

--- title: "Structural Equation Modelling in R" author: "Martin Schweinberger" date: "2026" params: title: "Structural Equation Modelling in R" author: "Martin Schweinberger" year: "2026" version: "2026.03.28" url: "https://ladal.edu.au/tutorials/sem/sem.html" institution: "The Language Technology and Data Analysis Laboratory (LADAL), The University of Queensland, Australia" description: "This tutorial introduces structural equation modelling (SEM) in R using lavaan, covering confirmatory factor analysis, path diagrams, model specification, global fit indices (CFI, RMSEA, SRMR), and model comparison using AIC and BIC. It is aimed at researchers in psycholinguistics, applied linguistics, and the social sciences who need to model complex relationships among multiple variables simultaneously." doi: "10.5281/zenodo.19332953" format: html: toc: true toc-depth: 4 code-fold: show code-tools: true theme: cosmo --- ```{r setup, echo=FALSE, message=FALSE, warning=FALSE} library(checkdown) library(dplyr) library(ggplot2) library(tidyr) library(flextable) library(lavaan) library(semPlot) library(semTools) library(psych) options(stringsAsFactors = FALSE) options("scipen" = 100, "digits" = 12) ``` ![](/images/uq1.jpg){ width=100% } # Introduction {#intro} ![](/images/gy_chili.png){ width=15% style="float:right; padding:10px" } This tutorial introduces **Structural Equation Modelling (SEM)** — a powerful and flexible family of multivariate statistical techniques that allows researchers to simultaneously model multiple relationships among variables, account for measurement error, and test theories about constructs that cannot be directly observed. Where [simple linear regression](/tutorials/regression/regression.html) models a single outcome from one or more predictors, SEM can model entire systems of relationships, including situations where the same variable acts as both a predictor and an outcome, and where some of the most important variables in a theory are not directly measurable at all. SEM is particularly well suited to the language sciences. Much of what linguists and applied linguists care about — *language anxiety*, *motivation*, *metalinguistic awareness*, *communicative competence*, *reading ability* — cannot be captured in a single measurement. These are **latent constructs**: theoretical entities that we infer indirectly from a set of observable indicators such as questionnaire items or test scores. SEM provides a principled framework for doing exactly this, and then for examining how these latent constructs relate to one another and to observable outcomes. SEM is increasingly recognised as a valuable tool in corpus linguistics and cognitive linguistics. @larsson2021sem make the case that path models — a fundamental building block of SEM — are well suited to the multivariate nature of corpus-linguistic data, enabling researchers to move beyond monofactorial analyses and test theoretically motivated causal structures. @fuoli2022sem provides a step-by-step introduction to SEM in R for linguists working in a cognitive-linguistic framework, demonstrating its utility for modelling the psychological effects of linguistic choices. @rosseel2012lavaan's `lavaan` package, which we use throughout this tutorial, has made full-featured SEM freely available in R. This tutorial is aimed at **beginners with no prior exposure to SEM**. You do not need to have studied factor analysis or path analysis before, though familiarity with basic regression is helpful. The goal is to build conceptual understanding from the ground up and to equip you with the practical skills to fit, evaluate, and report SEM models in R. ::: {.callout-note} ## Learning Objectives By the end of this tutorial you will be able to: 1. Explain the distinction between observed and latent variables and describe why measurement error matters 2. Identify the two building blocks of a full SEM — the measurement model and the structural model — and describe what each specifies 3. Read and interpret a standard SEM path diagram 4. Specify a Confirmatory Factor Analysis (CFA) in `lavaan` model syntax 5. Evaluate a CFA using model fit indices (CFI, TLI, RMSEA, SRMR) and reliability coefficients (McDonald's ω) 6. Extend a measurement model to a full SEM by adding structural paths 7. Interpret standardised path coefficients and R² values from a full SEM 8. Test mediation hypotheses using labelled paths and bootstrapped confidence intervals 9. Compare nested and non-nested SEM specifications using Δχ², AIC, and BIC 10. Use modification indices responsibly to diagnose model misfit 11. Report SEM results in accordance with current best-practice conventions in linguistics and applied linguistics ::: ::: {.callout-note} ## Prerequisite Tutorials Before working through this tutorial, we recommend familiarity with the following: - [Introduction to Quantitative Reasoning](/tutorials/introquant/introquant.html) - [Basic Concepts in Quantitative Research](/tutorials/basicquant/basicquant.html) - [Descriptive Statistics](/tutorials/dstats/dstats.html) - [Basic Inferential Statistics](/tutorials/basicstatz/basicstatz.html) - [Simple and Multiple Linear Regression](/tutorials/regression/regression.html) - [Getting started with R](/tutorials/intror/intror.html) - [Loading, saving, and generating data in R](/tutorials/load/load.html) ::: ::: {.callout-note} ## Citation ```{r citation-callout-top, echo=FALSE, results='asis'} cat( params$author, ". ", params$year, ". *", params$title, "*. ", params$institution, ". ", "url: ", params$url, " ", "(Version ", params$version, ").", sep = "" ) ``` ::: --- ## Preparation and Session Set-up {-} Install required packages once: ```{r prep1, echo=TRUE, eval=FALSE, message=FALSE, warning=FALSE} install.packages("lavaan") install.packages("semPlot") install.packages("semTools") install.packages("psych") install.packages("dplyr") install.packages("ggplot2") install.packages("tidyr") install.packages("flextable") install.packages("checkdown") ``` Load packages for this session: ```{r load-packages, message=FALSE, warning=FALSE} library(lavaan) # SEM and CFA estimation library(semPlot) # path diagram visualisation library(semTools) # reliability and model comparison tools library(psych) # descriptive statistics and correlation matrices library(dplyr) # data manipulation library(ggplot2) # data visualisation library(tidyr) # data reshaping library(flextable) # formatted tables library(checkdown) # interactive quiz questions ``` --- ## The Dataset {-} Throughout this tutorial we use a **simulated dataset** inspired by research on second-language (L2) writing. The data represent 300 university students who completed a battery of questionnaire scales and an academic writing task. The dataset includes: - **Language Anxiety** (`anx1`–`anx3`): three Likert-scale items measuring the degree to which students feel anxious when writing in their L2 (higher = more anxious) - **Writing Self-Efficacy** (`eff1`–`eff3`): three items measuring students' confidence in their L2 writing ability (higher = greater self-efficacy) - **Motivation** (`mot1`–`mot3`): three items measuring students' intrinsic motivation to improve their L2 writing (higher = more motivated) - **Writing Score** (`writing_score`): a holistic score (0–100) assigned by trained raters to an in-class academic writing task Because the data are simulated in R, no external file is needed — you can reproduce the entire analysis from the code below. ```{r simulate-data, message=FALSE, warning=FALSE} set.seed(42) n <- 300 # Latent variable scores anxiety <- rnorm(n, 0, 1) efficacy <- rnorm(n, 0, 1) motivat <- 0.55 * efficacy + rnorm(n, 0, sqrt(1 - 0.55^2)) # Observed indicators (loading * latent + unique error) anx1 <- 0.78 * anxiety + rnorm(n, 0, sqrt(1 - 0.78^2)) anx2 <- 0.72 * anxiety + rnorm(n, 0, sqrt(1 - 0.72^2)) anx3 <- 0.80 * anxiety + rnorm(n, 0, sqrt(1 - 0.80^2)) eff1 <- 0.80 * efficacy + rnorm(n, 0, sqrt(1 - 0.80^2)) eff2 <- 0.74 * efficacy + rnorm(n, 0, sqrt(1 - 0.74^2)) eff3 <- 0.77 * efficacy + rnorm(n, 0, sqrt(1 - 0.77^2)) mot1 <- 0.73 * motivat + rnorm(n, 0, sqrt(1 - 0.73^2)) mot2 <- 0.76 * motivat + rnorm(n, 0, sqrt(1 - 0.76^2)) mot3 <- 0.70 * motivat + rnorm(n, 0, sqrt(1 - 0.70^2)) # Outcome: Writing Score writing_score <- 55 + 10 * efficacy - 6 * anxiety + 4 * motivat + rnorm(n, 0, 6) writing_score <- round(pmin(pmax(writing_score, 10), 100)) # Assemble data frame semdata <- data.frame(anx1, anx2, anx3, eff1, eff2, eff3, mot1, mot2, mot3, writing_score) ``` ```{r view-data, echo=FALSE, message=FALSE, warning=FALSE} semdata |> head(10) |> flextable() |> flextable::set_table_properties(width = .99, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "center") |> flextable::set_caption(caption = "First ten rows of the simulated L2 writing dataset (n = 300).") |> flextable::border_outer() ``` --- # Conceptual Foundations {#concepts} ::: {.callout-note} ## Section Overview **What you will learn:** The core ideas underpinning SEM — latent variables, measurement error, path diagrams, and the two-component structure of a full SEM. **Why it matters:** SEM notation and vocabulary are quite different from ordinary regression. Building a solid conceptual foundation before fitting models prevents common misinterpretations. ::: ## Observed vs. latent variables {-} A fundamental distinction in SEM is between variables you can observe directly and those you cannot. **Observed (manifest) variables** are things you actually measure and record: a Likert-scale item, a test score, a reaction time, a corpus frequency count. They appear as columns in your dataset. **Latent variables** are theoretical constructs that you cannot measure directly. *Language anxiety*, *motivation*, and *writing self-efficacy* are classic examples from applied linguistics. No single questionnaire item perfectly captures any of these constructs — each item is merely a *fallible indicator*. Latent variables are never columns in your dataset; instead, they are modelled as common causes of their observed indicators. This distinction matters because **measurement error is unavoidable** whenever we use observed items to represent theoretical constructs. If we ignore this error — for example, by averaging questionnaire items and treating the result as if it were the true construct — we introduce **attenuation bias** into our estimates of relationships. SEM addresses this explicitly: it partitions the variance in each observed indicator into a part explained by the underlying latent variable and a part attributed to unique error (random noise plus any systematic variance not shared with the other indicators). As @larsson2021sem note, treating latent variables such as motivation and proficiency as observed (e.g., by using composite scores) leads to underestimation of relationships, which is one of the key arguments for using SEM in language research. ## The two building blocks of SEM {-} A full structural equation model is composed of two sub-models: | Sub-model | Technical name | What it specifies | |---|---|---| | **Measurement model** | Confirmatory Factor Analysis (CFA) | Which observed items are indicators of which latent variables; how strongly each item loads onto its construct; how much unique error each item has | | **Structural model** | Path model | Directional relationships among latent variables (and between latent variables and observed outcomes); regression-like paths encoding theoretical predictions | In the standard **two-step approach** to SEM [@anderson1988sem], researchers first establish an adequate measurement model (Step 1) before testing the structural paths of theoretical interest (Step 2). This tutorial follows this workflow: we build and evaluate a CFA in [Section 3](#cfa) and then add structural paths in [Section 5](#fullsem). ## Path diagrams {-} SEM models are almost always communicated visually through **path diagrams**. The notation is standardised: | Symbol | Represents | |---|---| | **Rectangle** | Observed (manifest) variable | | **Oval / ellipse** | Latent variable | | **Single-headed arrow** (→) | Directional path (a regression-type effect) | | **Double-headed curved arrow** (↔) | Covariance or correlation | | **Small arrow into rectangle** | Residual / unique error for that indicator | | **Small arrow into oval** | Disturbance (residual error for an endogenous latent variable) | In a **measurement model**, ovals point to rectangles: the latent construct is hypothesised to *cause* variation in its observed indicators. In a **structural model**, ovals point to other ovals, encoding directional theoretical predictions among constructs. ::: {.callout-important} ## SEM is a confirmatory, theory-driven method Unlike Exploratory Factor Analysis (EFA), which discovers factor structure empirically from the data, SEM requires the researcher to **specify the model in advance** based on theory. Every path in the diagram — every arrow that is included or excluded — reflects a theoretical decision. A good model fit indicates that the specified model is *consistent* with the data; it does not prove the model is the only correct one. Alternative models that fit equally well are always possible (this is the **problem of equivalent models**). Always ground your SEM specifications in theory, not post-hoc data exploration [@kline2023principles]. ::: ## A conceptual map of our example {-} Our theoretical model for the L2 writing dataset can be described as follows: 1. **Language Anxiety**, **Writing Self-Efficacy**, and **Motivation** are latent constructs, each measured by three questionnaire items. 2. We expect Self-Efficacy and Anxiety to have opposite effects on **Writing Score**: greater self-efficacy should improve performance; greater anxiety should impair it. 3. Self-Efficacy is also expected to influence Motivation (students who feel more capable tend to be more motivated), and Motivation may in turn have a positive effect on Writing Score. This indirect path constitutes a **mediation** hypothesis. This conceptual model drives all the analytic choices that follow. --- # Descriptive Statistics and Correlations {#descriptives} ::: {.callout-note} ## Section Overview **What you will learn:** How to examine the observed variables before fitting any model. **Key steps:** Descriptive statistics, distribution checks, inter-item correlations. ::: Before fitting any SEM, it is good practice to examine the distributions and inter-relationships of your observed variables. Severe non-normality or implausible correlations can signal problems that need to be addressed before modelling. ## Descriptive statistics {-} ```{r desc01, message=FALSE, warning=FALSE} psych::describe(semdata) |> dplyr::select(n, mean, sd, median, skew, kurtosis, min, max) |> round(3) |> tibble::rownames_to_column("Variable") |> flextable() |> flextable::set_table_properties(width = .99, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "Descriptive statistics for all observed variables.") |> flextable::border_outer() ``` All items are centred near zero (as expected for standardised simulated data). Skewness values are within the acceptable range of [−1, +1] for all items, meaning that the normality assumption required for maximum likelihood estimation in `lavaan` [@rosseel2012lavaan] is not substantially violated. ## Correlation matrix {-} A correlation matrix helps us verify that items within the same scale correlate with each other (convergent evidence) and that items from different scales correlate less strongly (discriminant evidence). ```{r corr01, message=FALSE, warning=FALSE} cor_mat <- cor(semdata |> dplyr::select(-writing_score)) |> round(2) cor_mat |> as.data.frame() |> tibble::rownames_to_column("Variable") |> flextable() |> flextable::set_table_properties(width = .99, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 10) |> flextable::fontsize(size = 10, part = "header") |> flextable::align_text_col(align = "center") |> flextable::set_caption(caption = "Pearson correlation matrix for the nine questionnaire items.") |> flextable::border_outer() ``` ```{r corr-viz, message=FALSE, warning=FALSE} cor_long <- cor_mat |> as.data.frame() |> tibble::rownames_to_column("Var1") |> tidyr::pivot_longer(-Var1, names_to = "Var2", values_to = "r") ggplot(cor_long, aes(x = Var1, y = Var2, fill = r)) + geom_tile(color = "white") + geom_text(aes(label = round(r, 2)), size = 3.2) + scale_fill_gradient2(low = "tomato", mid = "white", high = "steelblue", midpoint = 0, limits = c(-1, 1), name = "r") + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), panel.grid = element_blank()) + labs(title = "Correlation heatmap: nine questionnaire items", x = "", y = "") ``` The heatmap confirms the expected pattern: items within each scale (e.g., `anx1`–`anx3`) correlate strongly with each other and more weakly with items from the other scales. The efficacy and motivation items show moderate cross-scale correlations, consistent with our theoretical expectation that the two constructs are related. --- # Confirmatory Factor Analysis (CFA) {#cfa} ::: {.callout-note} ## Section Overview **What you will learn:** How to specify, fit, and evaluate a measurement model using CFA in `lavaan`. **Key concepts:** Factor loadings, model fit indices, reliability, convergent and discriminant validity. **Why CFA before SEM:** The measurement model must be established before structural paths are meaningful. If your indicators do not adequately reflect the intended latent constructs, the structural estimates will be uninterpretable. ::: ## What is Confirmatory Factor Analysis? {-} **Confirmatory Factor Analysis (CFA)** is a measurement modelling technique in which the researcher specifies in advance which observed variables (indicators) are assumed to reflect which latent factors (constructs), and then tests whether this specification is consistent with the observed data. This is what distinguishes CFA from **Exploratory Factor Analysis (EFA)**: in EFA the factor structure is *discovered* from the data with no prior constraints; in CFA the factor structure is *specified* from theory and then *confirmed* (or disconfirmed) empirically. In our example, we hypothesise three latent factors: - **Anxiety** (*ANX*), indicated by `anx1`, `anx2`, `anx3` - **Self-Efficacy** (*EFF*), indicated by `eff1`, `eff2`, `eff3` - **Motivation** (*MOT*), indicated by `mot1`, `mot2`, `mot3` ## Specifying a CFA model in `lavaan` {-} The `lavaan` package [@rosseel2012lavaan] uses a simple, readable model syntax. The key operator for defining a measurement model is `=~` which is read as *"is measured by"* or *"is indicated by"*: ``` LatentVariable =~ indicator1 + indicator2 + indicator3 ``` We specify our three-factor measurement model as follows: ```{r cfa-spec, message=FALSE, warning=FALSE} cfa_model <- ' # Measurement model ANX =~ anx1 + anx2 + anx3 EFF =~ eff1 + eff2 + eff3 MOT =~ mot1 + mot2 + mot3 ' ``` ::: {.callout-note} ## `lavaan` model syntax at a glance | Operator | Meaning | Example | |---|---|---| | `=~` | Measured by (latent → indicator) | `ANX =~ anx1 + anx2` | | `~` | Regressed on (structural path) | `MOT ~ EFF` | | `~~` | Correlated with (covariance) | `ANX ~~ EFF` | | `~1` | Intercept / mean | `anx1 ~ 1` | By default, `lavaan` fixes the first indicator loading to 1.0 to set the scale of each latent variable (the **marker variable** method), freely estimates the remaining loadings, freely estimates all indicator residuals, and freely estimates all latent variable covariances. You can change these defaults using arguments to `cfa()` or `sem()`. ::: ## Fitting the CFA model {-} We fit the model using `lavaan::cfa()`. The default estimator is Maximum Likelihood (ML), which assumes multivariate normality of the observed variables. ```{r cfa-fit, message=FALSE, warning=FALSE} cfa_fit <- lavaan::cfa(cfa_model, data = semdata, estimator = "ML") summary(cfa_fit, fit.measures = TRUE, standardized = TRUE) ``` This output contains three major sections: **model fit information**, **factor loadings** (both unstandardised and standardised), and **latent variable covariances**. ## Interpreting factor loadings {-} Factor loadings express how strongly each indicator is related to its underlying latent variable. In the **standardised solution** (column `Std.all`), a loading can be interpreted like a correlation: it represents the expected change in the standardised indicator for a one-standard-deviation increase in the latent variable. Standardised loadings above **0.50** are generally considered acceptable; loadings above **0.70** are considered strong [@hair2019multivariate]. ```{r cfa-loadings, message=FALSE, warning=FALSE} loadings_df <- lavaan::standardizedsolution(cfa_fit) |> dplyr::filter(op == "=~") |> dplyr::select(Latent = lhs, Indicator = rhs, Std_Loading = est.std, SE = se, z = z, p = pvalue) |> dplyr::mutate(across(where(is.numeric), ~round(.x, 3))) loadings_df |> flextable() |> flextable::set_table_properties(width = .85, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "Standardised CFA factor loadings with standard errors and significance tests.") |> flextable::border_outer() ``` All standardised loadings should exceed 0.50, confirming that each indicator is a meaningful reflection of its intended latent construct. ## Model fit assessment {-} Fitting a CFA does not automatically produce a good model. We must evaluate how well the specified model reproduces the observed covariance structure in the data. This is done using **model fit indices** — statistics that summarise the discrepancy between the model-implied covariance matrix and the observed covariance matrix. ::: {.callout-important} ## Model fit indices: what they mean and which cut-offs to use No single fit index is sufficient. Report a combination of the following: | Index | Full name | What it measures | Acceptable | Good | |---|---|---|---|---| | **χ²** | Chi-square test | Overall model misfit (sensitive to N) | *p* > .05 (rarely achieved) | — | | **CFI** | Comparative Fit Index | Fit relative to null model | ≥ .90 | ≥ .95 | | **TLI** | Tucker–Lewis Index | Fit relative to null model (penalises complexity) | ≥ .90 | ≥ .95 | | **RMSEA** | Root Mean Square Error of Approximation | Average misfit per degree of freedom | ≤ .08 | ≤ .05 | | **SRMR** | Standardised Root Mean Square Residual | Average standardised residual | ≤ .08 | ≤ .05 | Cut-offs are from @hu1999cutoff. These are guidelines, not hard thresholds — model fit must always be evaluated in the context of model complexity and sample size [@kline2023principles]. The χ² test is almost always significant in moderate to large samples even for well-fitting models, because it is extremely sensitive to sample size. It is therefore standard practice to rely on the incremental and approximate fit indices (CFI, TLI, RMSEA, SRMR) rather than on χ² alone [@fuoli2022sem]. ::: ```{r cfa-fitindices, message=FALSE, warning=FALSE} fit_indices <- lavaan::fitMeasures(cfa_fit, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "rmsea.ci.lower", "rmsea.ci.upper", "srmr")) |> round(3) data.frame( Index = c("chi-square", "df", "p (chi-square)", "CFI", "TLI", "RMSEA", "RMSEA 90% CI lower", "RMSEA 90% CI upper", "SRMR"), Value = as.numeric(fit_indices), Threshold = c("—", "—", "> .05", ">= .95", ">= .95", "<= .05", "—", "—", "<= .05") ) |> flextable() |> flextable::set_table_properties(width = .70, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "CFA model fit indices with recommended thresholds (Hu & Bentler, 1999).") |> flextable::border_outer() ``` ## Internal consistency reliability {-} Beyond model fit, we assess whether each scale is internally consistent — that is, whether the indicators of each latent variable reliably hang together. We use **McDonald's omega (ω)**, which is the preferred reliability coefficient for factor-based scales because, unlike Cronbach's alpha, it does not assume equal factor loadings [@mcdonald1999test]. ```{r reliability, message=FALSE, warning=FALSE} rel <- semTools::reliability(cfa_fit) data.frame( Scale = c("ANX (Language Anxiety)", "EFF (Writing Self-Efficacy)", "MOT (Motivation)"), Omega = round(as.numeric(rel["omega", ]), 3), Alpha = round(as.numeric(rel["alpha", ]), 3) ) |> flextable() |> flextable::set_table_properties(width = .70, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "McDonald's omega and Cronbach's alpha for each scale.") |> flextable::border_outer() ``` Values of ω ≥ .70 are generally considered acceptable for research purposes; ω ≥ .80 is considered good [@nunnally1978psychometric]. ## Visualising the measurement model {-} The `semPlot` package produces path diagrams directly from a fitted `lavaan` object. ```{r cfa-plot, message=FALSE, warning=FALSE, fig.width=9, fig.height=6} semPlot::semPaths( cfa_fit, what = "std", layout = "tree", rotation = 2, edge.label.cex = 0.85, sizeMan = 7, sizeLat = 10, color = list(lat = "steelblue", man = "lightyellow"), title = FALSE, style = "lisrel" ) title("CFA measurement model — standardised solution", cex.main = 1) ``` Each oval represents a latent variable; each rectangle an observed indicator. The numbers on the arrows are standardised factor loadings; the numbers on the small arrows into each rectangle are standardised residual variances (unique errors). --- ::: {.callout-tip} ## Exercises: CFA ::: **Q1. In a CFA path diagram, what does a single-headed arrow from an oval to a rectangle represent?** ```{r} #| echo: false #| label: "CFA_Q1" check_question( "The latent variable (oval) is hypothesised to cause variation in the observed indicator (rectangle)", options = c( "The latent variable (oval) is hypothesised to cause variation in the observed indicator (rectangle)", "The observed indicator (rectangle) causes the latent variable (oval)", "The two variables are simply correlated, with no causal direction implied", "The arrow indicates that the two variables share measurement error" ), type = "radio", q_id = "CFA_Q1", random_answer_order = TRUE, button_label = "Check answer", right = "Correct! In CFA (and SEM generally), latent variables are modelled as common causes of their observed indicators. The direction of causality runs from the latent oval to the observed rectangle. This is called a *reflective* measurement model — changes in the latent construct are reflected in corresponding changes in each indicator.", wrong = "Think about what a latent variable is: an unobserved construct that we assume underlies (causes) the pattern of responses on the observed items. Which direction should the arrow point?" ) ``` **Q2. A CFA model returns CFI = .88 and RMSEA = .09. What is the most appropriate conclusion?** ```{r} #| echo: false #| label: "CFA_Q2" check_question( "The model fit is poor — both indices fall below the recommended thresholds. The model should be inspected and potentially revised.", options = c( "The model fit is poor — both indices fall below the recommended thresholds. The model should be inspected and potentially revised.", "The model fits well — CFI and RMSEA are never both good at the same time", "The fit is acceptable because RMSEA < .10", "No conclusion can be drawn without the chi-square p-value" ), type = "radio", q_id = "CFA_Q2", random_answer_order = TRUE, button_label = "Check answer", right = "Correct! CFI = .88 falls below the commonly accepted threshold of ≥ .90 (and well below the ≥ .95 threshold for good fit). RMSEA = .09 exceeds the acceptable upper bound of ≤ .08. Both indices point to poor fit. The researcher should examine modification indices, check whether items cross-load onto the wrong factors, and consider whether the theoretical model needs revision.", wrong = "Check the recommended thresholds: CFI ≥ .90 (good: ≥ .95) and RMSEA ≤ .08 (good: ≤ .05). Do CFI = .88 and RMSEA = .09 meet these?" ) ``` **Q3. What is the main difference between CFA and Exploratory Factor Analysis (EFA)?** ```{r} #| echo: false #| label: "CFA_Q3" check_question( "In CFA the researcher specifies which indicators belong to which factors in advance based on theory; in EFA the factor structure is discovered empirically from the data", options = c( "In CFA the researcher specifies which indicators belong to which factors in advance based on theory; in EFA the factor structure is discovered empirically from the data", "CFA always produces better-fitting models than EFA", "EFA requires a larger sample size than CFA", "CFA is used for continuous variables; EFA is used for categorical variables" ), type = "radio", q_id = "CFA_Q3", random_answer_order = TRUE, button_label = "Check answer", right = "Correct! CFA is a confirmatory, theory-driven technique. The researcher decides in advance — based on theory and prior evidence — which observed items load onto which latent factors, and then tests whether this structure is consistent with the data. EFA makes no such prior commitments: it uses the data itself to determine how many factors are needed and which items load onto each. EFA is appropriate for scale development and initial exploration; CFA is appropriate for testing an established theoretical measurement structure.", wrong = "The key distinction is about the role of theory vs. data in determining factor structure. Which technique imposes a theoretical structure before looking at the data?" ) ``` --- # Full Structural Equation Model {#fullsem} ::: {.callout-note} ## Section Overview **What you will learn:** How to extend a CFA measurement model by adding directional structural paths between latent variables and outcomes. **Key concepts:** Endogenous vs. exogenous variables, structural paths, disturbances, standardised path coefficients. ::: Once we are satisfied with the measurement model, we add the **structural paths** — the directional hypotheses about how the latent variables relate to each other and to the writing score outcome. Our theoretical model predicts: 1. **Anxiety** → **Writing Score** (negative effect: more anxious students perform worse) 2. **Self-Efficacy** → **Writing Score** (positive effect) 3. **Self-Efficacy** → **Motivation** (positive effect: more efficacious students are more motivated) 4. **Motivation** → **Writing Score** (positive effect) Path (3) combined with path (4) constitutes an **indirect effect** of Self-Efficacy on Writing Score *through* Motivation — a mediation hypothesis examined in [Section 6](#mediation). ## Specifying the full SEM {-} In `lavaan`, structural paths are specified using the `~` operator, which is read as *"is regressed on"*: ``` Outcome ~ Predictor ``` We combine the measurement model with the structural paths in a single model string: ```{r sem-spec, message=FALSE, warning=FALSE} sem_model <- ' # --- Measurement model --- ANX =~ anx1 + anx2 + anx3 EFF =~ eff1 + eff2 + eff3 MOT =~ mot1 + mot2 + mot3 # --- Structural paths --- MOT ~ EFF writing_score ~ ANX + EFF + MOT ' ``` ::: {.callout-note} ## Endogenous vs. exogenous variables In SEM terminology: - **Exogenous variables** have no incoming arrows (they are only predictors, never outcomes). In our model, *ANX* and *EFF* are exogenous latent variables. - **Endogenous variables** have at least one incoming arrow (they are outcomes of at least one other variable). *MOT* and *writing_score* are endogenous. Endogenous variables have a **disturbance** (residual error) term — the part of their variance not explained by the variables pointing to them. `lavaan` estimates disturbances automatically. ::: ## Fitting the full SEM {-} We fit the full SEM using `lavaan::sem()`. The syntax is identical to `cfa()` but with the full model specification: ```{r sem-fit, message=FALSE, warning=FALSE} sem_fit <- lavaan::sem(sem_model, data = semdata, estimator = "ML") summary(sem_fit, fit.measures = TRUE, standardized = TRUE) ``` ## Structural path estimates {-} ```{r sem-paths, message=FALSE, warning=FALSE} sem_paths_df <- lavaan::standardizedsolution(sem_fit) |> dplyr::filter(op == "~") |> dplyr::select(Outcome = lhs, Predictor = rhs, Std_Estimate = est.std, SE = se, z = z, p = pvalue) |> dplyr::mutate( across(where(is.numeric), ~round(.x, 3)), Sig = dplyr::case_when( p < .001 ~ "***", p < .01 ~ "**", p < .05 ~ "*", TRUE ~ "" ) ) sem_paths_df |> flextable() |> flextable::set_table_properties(width = .90, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "Standardised structural path coefficients from the full SEM.") |> flextable::border_outer() ``` Standardised path coefficients can be interpreted similarly to standardised regression coefficients (β): they indicate the expected change in the outcome (in standard deviation units) for a one-standard-deviation increase in the predictor, holding all other predictors constant. ## Visualising the full SEM {-} ```{r sem-plot, message=FALSE, warning=FALSE, fig.width=10, fig.height=7} semPlot::semPaths( sem_fit, what = "std", layout = "tree2", rotation = 2, edge.label.cex = 0.80, sizeMan = 6, sizeLat = 10, color = list(lat = "steelblue", man = "lightyellow"), title = FALSE, style = "lisrel", residuals = TRUE, curvePivot = TRUE ) title("Full SEM — standardised solution", cex.main = 1) ``` ## R² for endogenous variables {-} ```{r sem-rsq, message=FALSE, warning=FALSE} data.frame( Variable = names(lavaan::inspect(sem_fit, "r2")), R2 = round(as.numeric(lavaan::inspect(sem_fit, "r2")), 3) ) |> dplyr::filter(R2 > 0) |> flextable() |> flextable::set_table_properties(width = .45, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "center") |> flextable::set_caption(caption = "Proportion of variance explained (R2) for endogenous variables.") |> flextable::border_outer() ``` --- ::: {.callout-tip} ## Exercises: Full SEM ::: **Q1. In the `lavaan` model syntax, what does the `~` operator specify?** ```{r} #| echo: false #| label: "SEM_Q1" check_question( "A directional structural path: the variable on the left is regressed on the variable on the right", options = c( "A directional structural path: the variable on the left is regressed on the variable on the right", "A measurement relationship: a latent variable is measured by an indicator", "A covariance between two variables", "An equality constraint between two parameters" ), type = "radio", q_id = "SEM_Q1", random_answer_order = TRUE, button_label = "Check answer", right = "Correct! In lavaan syntax, `~` specifies a regression path. `Y ~ X` means Y is regressed on X — X is the predictor and Y is the outcome. This is analogous to the regression formula notation in base R (e.g., `lm(Y ~ X)`). The three key operators are: `=~` for measurement (latent → indicator), `~` for regression (predictor → outcome), and `~~` for covariances.", wrong = "Think about how R's formula notation works. In `lm(Y ~ X)`, what does `~` separate? The same logic applies in lavaan." ) ``` **Q2. A standardised structural path coefficient of β = −0.42 (p < .001) from Anxiety to Writing Score means:** ```{r} #| echo: false #| label: "SEM_Q2" check_question( "A one-standard-deviation increase in Anxiety is associated with a 0.42 standard deviation decrease in Writing Score, holding other variables constant", options = c( "A one-standard-deviation increase in Anxiety is associated with a 0.42 standard deviation decrease in Writing Score, holding other variables constant", "42% of the variance in Writing Score is explained by Anxiety", "Anxiety causes Writing Score to decrease by 42 points on the raw scale", "The correlation between Anxiety and Writing Score is -0.42" ), type = "radio", q_id = "SEM_Q2", random_answer_order = TRUE, button_label = "Check answer", right = "Correct! Standardised path coefficients (β) are interpreted like standardised regression coefficients: a one-SD increase in the predictor is associated with a β-SD change in the outcome, controlling for other variables in the model. The negative sign confirms the expected direction: higher anxiety is associated with lower writing performance. This is not the same as a correlation (which is bivariate), nor does it tell us the proportion of variance explained (that would be R²).", wrong = "Standardised path coefficients have the same interpretation as standardised regression coefficients (β). Think about what a standardised coefficient tells you about the relationship between two variables measured in standard deviation units." ) ``` --- # Mediation Analysis {#mediation} ::: {.callout-note} ## Section Overview **What you will learn:** How to test mediation hypotheses — indirect effects of one variable on another via a third — within an SEM framework. **Key concepts:** Direct effects, indirect effects, total effects, bootstrapped confidence intervals. ::: ## What is mediation? {-} **Mediation** occurs when the effect of a predictor (*X*) on an outcome (*Y*) operates — at least in part — *through* an intervening variable, the **mediator** (*M*). Rather than a simple direct path *X* → *Y*, the effect is transmitted via the chain *X* → *M* → *Y*. In our example, the theoretical mediation hypothesis is: > **Self-Efficacy** (*EFF*) influences **Writing Score** both directly and *indirectly* by increasing **Motivation** (*MOT*), which in turn improves **Writing Score**. This decomposes the total effect of Self-Efficacy on Writing Score into a **direct effect** (*EFF* → *writing_score*), an **indirect effect** via Motivation (*EFF* → *MOT* → *writing_score*), and the **total effect** (direct + indirect). ## Specifying mediation in `lavaan` {-} `lavaan` uses **labels** to name individual paths, which can then be combined using the `:=` operator to define new parameters such as indirect and total effects. Labels are assigned by prefixing a path coefficient with a name followed by `*`: ```{r med-spec, message=FALSE, warning=FALSE} mediation_model <- ' # --- Measurement model --- ANX =~ anx1 + anx2 + anx3 EFF =~ eff1 + eff2 + eff3 MOT =~ mot1 + mot2 + mot3 # --- Structural paths (labelled for mediation) --- MOT ~ a * EFF # path a: EFF -> MOT writing_score ~ b * MOT # path b: MOT -> writing_score writing_score ~ c * EFF + ANX # path c: direct EFF -> writing_score # --- Defined parameters --- indirect := a * b # indirect effect of EFF via MOT total := c + (a * b) # total effect of EFF on writing_score ' ``` ## Bootstrapped confidence intervals for indirect effects {-} Indirect effects are the *product* of two path coefficients (*a × b*). Their sampling distribution is often asymmetric and non-normal, which makes standard errors based on normality assumptions unreliable. The recommended approach is **bootstrapping**: repeatedly resampling from the data, re-fitting the model, and using the resulting distribution of indirect effect estimates to construct confidence intervals. If the 95% bootstrapped CI does not contain zero, the indirect effect is statistically significant [@fuoli2022sem; @kline2023principles]. ```{r med-fit, message=FALSE, warning=FALSE, cache=TRUE} set.seed(42) med_fit <- lavaan::sem(mediation_model, data = semdata, estimator = "ML", se = "bootstrap", bootstrap = 1000) med_effects <- lavaan::parameterEstimates(med_fit, boot.ci.type = "bca.simple") |> dplyr::filter(label %in% c("a", "b", "c", "indirect", "total")) |> dplyr::select(Label = label, Estimate = est, SE = se, CI_lower = ci.lower, CI_upper = ci.upper, p = pvalue) |> dplyr::mutate(across(where(is.numeric), ~round(.x, 3))) med_effects |> flextable() |> flextable::set_table_properties(width = .85, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "Direct, indirect (mediated), and total effects with bootstrapped 95% CIs (1000 resamples).") |> flextable::border_outer() ``` ## Interpreting mediation results {-} To interpret mediation, we examine: (1) **Path *a*** (*EFF* → *MOT*): Is Self-Efficacy a significant predictor of Motivation? (2) **Path *b*** (*MOT* → *writing_score*): Is Motivation a significant predictor of Writing Score (controlling for other predictors)? (3) **Indirect effect** (*a × b*): Is the product significant, as indicated by a 95% CI that excludes zero? (4) **Direct effect *c*** (*EFF* → *writing_score*): Does Self-Efficacy still predict Writing Score after accounting for the mediation? If both the indirect effect is significant *and* the direct effect remains significant, we have **partial mediation**: Motivation carries part of the effect of Self-Efficacy to Writing Score, but Self-Efficacy also has an effect above and beyond that mediated path. If the direct effect becomes non-significant while the indirect effect is significant, we have **full mediation**. ::: {.callout-note} ## A note on causal language Mediation analysis is often discussed in causal terms ("X causes Y through M"). However, causal inference from cross-sectional observational data is not straightforward. A statistically significant indirect effect demonstrates that the data are *consistent* with a mediation mechanism — it does not prove causation. To make stronger causal claims, researchers need longitudinal designs, experimental manipulation of the mediator, or other causal identification strategies [@kline2023principles]. ::: --- ::: {.callout-tip} ## Exercises: Mediation ::: **Q1. What is the indirect effect in a mediation model?** ```{r} #| echo: false #| label: "MED_Q1" check_question( "The product of the path from the predictor to the mediator (a) and the path from the mediator to the outcome (b): indirect = a × b", options = c( "The product of the path from the predictor to the mediator (a) and the path from the mediator to the outcome (b): indirect = a × b", "The direct path from the predictor to the outcome, bypassing the mediator", "The correlation between the predictor and the mediator", "The total variance in the outcome explained by all predictors" ), type = "radio", q_id = "MED_Q1", random_answer_order = TRUE, button_label = "Check answer", right = "Correct! The indirect effect quantifies how much of the predictor's influence on the outcome is transmitted via the mediator. It is computed as the product of two paths: (a) the effect of the predictor on the mediator, and (b) the effect of the mediator on the outcome controlling for the predictor. If either a or b is zero, the indirect effect is zero — both links in the chain must be non-zero for mediation to occur.", wrong = "In a mediation chain X → M → Y, the indirect effect is the amount of X's influence on Y that travels through M. How would you quantify that using the two path coefficients a (X→M) and b (M→Y)?" ) ``` **Q2. Why are bootstrapped confidence intervals preferred over standard (normal-theory) confidence intervals for indirect effects?** ```{r} #| echo: false #| label: "MED_Q2" check_question( "Because indirect effects are products of two path coefficients, their sampling distribution is often asymmetric and non-normal — bootstrapping does not assume normality and therefore produces more accurate CIs", options = c( "Because indirect effects are products of two path coefficients, their sampling distribution is often asymmetric and non-normal — bootstrapping does not assume normality and therefore produces more accurate CIs", "Because bootstrapping always produces wider, more conservative confidence intervals", "Because standard CIs are only valid for indirect effects with more than two paths", "Bootstrapped CIs are not actually preferred — standard CIs are equally appropriate" ), type = "radio", q_id = "MED_Q2", random_answer_order = TRUE, button_label = "Check answer", right = "Correct! The indirect effect a×b is the product of two random variables. Even if a and b are each normally distributed, their product is not — it tends to be asymmetrically distributed, especially in smaller samples. Standard (Sobel-test) CIs assume normality of the sampling distribution, which leads to CIs that are too narrow in the tails. Bootstrapping resamples from the actual data and builds an empirical distribution of the indirect effect, producing CIs that correctly capture the asymmetry. This is why the bootstrapped bias-corrected-and-accelerated (BCa) CI is recommended.", wrong = "Think about the mathematical structure of the indirect effect: it is a product of two estimated path coefficients. What does this imply about the shape of its sampling distribution?" ) ``` --- # Model Comparison and Modification {#modelcomp} ::: {.callout-note} ## Section Overview **What you will learn:** How to compare alternative SEM specifications using formal tests and fit indices, and how to use modification indices responsibly. **Key concepts:** Nested models, likelihood ratio (chi-square difference) test, AIC/BIC, modification indices. ::: ## Why compare models? {-} In practice, researchers often have competing theoretical models — alternative specifications that make different predictions about which paths should be present or absent. SEM provides tools for formally comparing such models. Two situations arise: 1. **Nested models**: Model A is a special case of Model B (Model A is Model B with one or more paths fixed to zero). These can be compared with a **chi-square difference test (Δχ²)**. 2. **Non-nested models**: Neither model is a special case of the other. These are compared using **information criteria** (AIC, BIC): lower values indicate better fit, penalised for model complexity. ## Comparing a constrained model {-} Suppose a reviewer argues that the direct path from Self-Efficacy to Writing Score is unnecessary and that all of Self-Efficacy's influence on Writing Score is mediated through Motivation. We test this by fitting a **constrained model** with the direct *EFF* → *writing_score* path removed: ```{r nested-mod, message=FALSE, warning=FALSE} constrained_model <- ' # --- Measurement model --- ANX =~ anx1 + anx2 + anx3 EFF =~ eff1 + eff2 + eff3 MOT =~ mot1 + mot2 + mot3 # --- Structural paths (direct EFF -> writing_score path removed) --- MOT ~ EFF writing_score ~ ANX + MOT ' constrained_fit <- lavaan::sem(constrained_model, data = semdata, estimator = "ML") lavaan::lavTestLRT(constrained_fit, sem_fit) ``` A significant Δχ² (*p* < .05) indicates that the constrained model fits significantly worse — that is, removing the direct path causes a significant deterioration in fit, providing evidence that the direct path contributes meaningfully and should be retained. ```{r aic-bic, message=FALSE, warning=FALSE} data.frame( Model = c("Full model (with direct EFF path)", "Constrained model (no direct EFF path)"), AIC = round(c(AIC(sem_fit), AIC(constrained_fit)), 1), BIC = round(c(BIC(sem_fit), BIC(constrained_fit)), 1) ) |> flextable() |> flextable::set_table_properties(width = .80, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "Model comparison: AIC and BIC for the full and constrained models.") |> flextable::border_outer() ``` The preferred model has the **lower** AIC (and lower BIC). A difference of more than 10 in BIC is generally considered strong evidence in favour of the model with the lower value. ## Modification indices {-} If a model fits poorly, **modification indices (MIs)** can help diagnose which additional paths or covariances would most improve fit. Each MI indicates how much the overall model χ² would decrease if a particular currently-fixed parameter were freed. ```{r mod-indices, message=FALSE, warning=FALSE} mi <- lavaan::modindices(sem_fit, sort. = TRUE, maximum.number = 10) mi |> dplyr::select(lhs, op, rhs, mi, epc) |> dplyr::mutate(across(c(mi, epc), ~round(.x, 3))) |> dplyr::rename(LHS = lhs, Operator = op, RHS = rhs, MI = mi, `Expected Parameter Change` = epc) |> flextable() |> flextable::set_table_properties(width = .85, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "Top 10 modification indices (sorted by MI, descending). MI > 10 typically warrants attention.") |> flextable::border_outer() ``` ::: {.callout-important} ## Using modification indices responsibly Modification indices are a double-edged sword. They are useful for diagnosing *systematic* misfit (e.g., correlated residuals between items that share method variance). However, acting on every high MI and re-fitting the model is a form of **capitalising on chance**: the revised model will fit the current sample better but may not generalise. Rules of thumb for responsible use [@jackson2009reporting]: 1. **Theory first**: only free a parameter if there is a substantive, theoretically defensible reason to do so. 2. **One at a time**: modify one parameter, re-fit, re-inspect — do not free multiple parameters simultaneously. 3. **Cross-validate**: if sample size permits, split the data and use one half to explore modifications and the other to confirm them. 4. **Report transparently**: if modifications were made post-hoc, report this explicitly and distinguish the revised model from the originally hypothesised model. ::: --- ::: {.callout-tip} ## Exercises: Model Comparison ::: **Q1. What does a significant chi-square difference test (Δχ²) between two nested models indicate?** ```{r} #| echo: false #| label: "MC_Q1" check_question( "The more constrained (simpler) model fits significantly worse than the less constrained (more complex) model — the freed parameter(s) contribute meaningfully to model fit", options = c( "The more constrained (simpler) model fits significantly worse than the less constrained (more complex) model — the freed parameter(s) contribute meaningfully to model fit", "The two models are equivalent in fit and either can be used", "The more complex model should always be rejected in favour of parsimony", "The chi-square difference test only applies to non-nested models" ), type = "radio", q_id = "MC_Q1", random_answer_order = TRUE, button_label = "Check answer", right = "Correct! When Model A is nested within Model B (A has more constraints / fewer free parameters), the delta chi-square = chi-square(A) - chi-square(B) follows a chi-square distribution with degrees of freedom equal to the difference in df between the two models. A significant result (p < .05) means the extra parameters in Model B account for a statistically significant improvement in fit — the simpler model does not fit as well. A non-significant result supports the simpler (more parsimonious) model.", wrong = "Think about what it means for a more constrained model to have a higher chi-square. A significant delta chi-square means the constraints imposed in the simpler model cause a significant deterioration in fit. What does that imply about those constraints?" ) ``` **Q2. A modification index of 24.5 suggests adding a cross-loading of `anx2` onto the EFF factor. Should you add this path?** ```{r} #| echo: false #| label: "MC_Q2" check_question( "Not necessarily — only if there is a substantive theoretical justification. Adding paths purely because their MI is large constitutes post-hoc model fishing and inflates Type I error.", options = c( "Not necessarily — only if there is a substantive theoretical justification. Adding paths purely because their MI is large constitutes post-hoc model fishing and inflates Type I error.", "Yes — any MI above 10 must be freed to achieve acceptable model fit", "Yes — a larger MI always means the path is theoretically meaningful", "No — modification indices should never be consulted after model fitting" ), type = "radio", q_id = "MC_Q2", random_answer_order = TRUE, button_label = "Check answer", right = "Correct! A high MI tells you that freeing a parameter would improve statistical fit, but it says nothing about whether that parameter is *theoretically meaningful*. Freeing every high-MI parameter is a form of capitalising on chance — the model becomes over-fitted to the current sample and will not generalise. Always ask: 'Is there a substantive reason why anx2 should also reflect self-efficacy?' If the answer is no, the path should not be added, regardless of the MI value.", wrong = "A high modification index means a path would improve chi-square fit — but does a better chi-square automatically mean the path is theoretically defensible?" ) ``` --- # Reporting Standards {#reporting} ::: {.callout-note} ## Section Overview **What you will learn:** What to report in an SEM study, model reporting paragraph templates, a workflow summary table, and a reporting checklist. ::: Reporting SEM results clearly and completely is as important as the analysis itself. --- ## General principles {-} ::: {.callout-note} ## What to report in an SEM study Following current best practice [@kline2023principles; @jackson2009reporting; @larsson2021sem]: **Model specification** - The full theoretical rationale for the hypothesised model - Which variables are latent vs. observed; which indicators load onto which factors - Software and estimator used (e.g., "Models were estimated in R using the `lavaan` package [@rosseel2012lavaan] with Maximum Likelihood estimation") **Measurement model (CFA)** - Standardised factor loadings for all indicators (with SEs and significance) - All model fit indices: χ²(df), CFI, TLI, RMSEA (with 90% CI), SRMR - Scale reliabilities (McDonald's ω or Cronbach's α) **Structural model** - Standardised path coefficients (with SEs and significance) - R² for all endogenous variables - Model fit indices **Mediation (if applicable)** - Labelled paths (a, b, c/c'), indirect effect, total effect - Bootstrapped confidence intervals (state number of resamples) - Whether partial or full mediation was found **Model comparisons (if applicable)** - Δχ², Δdf, p-value for nested comparisons - AIC/BIC for non-nested comparisons ::: --- ## Model reporting paragraphs {-} ### CFA > A three-factor measurement model was specified *a priori* based on the theoretical framework, with Language Anxiety (*ANX*), Writing Self-Efficacy (*EFF*), and Motivation (*MOT*) each indicated by three Likert-scale items (nine indicators in total). The model was estimated using Maximum Likelihood in R (`lavaan`; @rosseel2012lavaan). Model fit was excellent: χ²(df) = *X.XX*, CFI = .97, TLI = .96, RMSEA = .04 [90% CI: .01, .07], SRMR = .04. All standardised factor loadings were significant and exceeded 0.70 (range: .71–.82), and McDonald's ω exceeded .80 for all three scales, indicating good reliability. The measurement model was retained for subsequent structural analysis. ### Full SEM > The structural model specified directional effects of Language Anxiety and Writing Self-Efficacy on Writing Score, and an effect of Self-Efficacy on Motivation. Model fit was acceptable: χ²(df) = *X.XX*, CFI = .96, TLI = .95, RMSEA = .05 [90% CI: .02, .07], SRMR = .05. Writing Self-Efficacy was a significant positive predictor of both Motivation (β = .55, SE = .07, *p* < .001) and Writing Score (β = .47, SE = .08, *p* < .001). Language Anxiety was a significant negative predictor of Writing Score (β = −.38, SE = .07, *p* < .001). Motivation significantly predicted Writing Score (β = .21, SE = .07, *p* = .003). Together, the predictors explained 58% of the variance in Writing Score. ### Mediation > To test whether the effect of Writing Self-Efficacy on Writing Score was partially mediated by Motivation, we re-estimated the model with labelled paths and requested 1000 bootstrap resamples for inference on the indirect effect [@fuoli2022sem]. The indirect effect of Self-Efficacy on Writing Score via Motivation was significant (unstandardised *b* = *X.XX*, 95% BCa CI [*X.XX*, *X.XX*]), indicating that part of the positive effect of self-efficacy on writing performance operates through increased motivation. The direct effect of Self-Efficacy on Writing Score remained significant after accounting for this indirect path, supporting **partial mediation**. --- ## Quick reference: SEM workflow {-} ```{r workflow-table, echo=FALSE, message=FALSE, warning=FALSE} data.frame( Step = c( "1. Theoretical specification", "2. Descriptive checks", "3. Confirmatory Factor Analysis", "4. Evaluate measurement fit", "5. Assess reliability", "6. Full SEM", "7. Mediation (if applicable)", "8. Model comparison", "9. Report" ), Action = c( "Draw path diagram; specify which indicators load onto which factors and which structural paths are hypothesised", "Examine distributions (skewness, kurtosis), correlations; check for multivariate outliers", "Fit measurement model with lavaan::cfa()", "Inspect CFI, TLI, RMSEA, SRMR against recommended thresholds", "Compute McDonald's omega with semTools::reliability()", "Add structural paths; fit with lavaan::sem()", "Label paths; define indirect/total effects with ':='; use se = 'bootstrap'", "Use lavTestLRT() for nested models; AIC/BIC for non-nested; consult MIs with theory", "Report all fit indices, standardised loadings, path coefficients, R2, and effect CIs" ), `Key R function(s)` = c( "—", "psych::describe(); cor()", "lavaan::cfa()", "lavaan::fitMeasures()", "semTools::reliability()", "lavaan::sem()", "lavaan::sem(se = 'bootstrap')", "lavaan::lavTestLRT(); AIC(); modindices()", "lavaan::standardizedsolution(); parameterEstimates()" ), check.names = FALSE ) |> flextable() |> flextable::set_table_properties(width = .99, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "Step-by-step SEM workflow with key R functions.") |> flextable::border_outer() ``` # Citation & Session Info {.unnumbered} ::: {.callout-note} ## Citation ```{r citation-callout, echo=FALSE, results='asis'} cat( params$author, ". ", params$year, ". *", params$title, "*. ", params$institution, ". ", "url: ", params$url, " ", "(Version ", params$version, "), ", "doi: ", params$doi, ".", sep = "" ) ``` ```{r citation-bibtex, echo=FALSE, results='asis'} key <- paste0( tolower(gsub(" ", "", gsub(",.*", "", params$author))), params$year, tolower(gsub("[^a-zA-Z]", "", strsplit(params$title, " ")[[1]][1])) ) cat("```\n") cat("@manual{", key, ",\n", sep = "") cat(" author = {", params$author, "},\n", sep = "") cat(" title = {", params$title, "},\n", sep = "") cat(" year = {", params$year, "},\n", sep = "") cat(" note = {", params$url, "},\n", sep = "") cat(" organization = {", params$institution, "},\n", sep = "") cat(" edition = {", params$version, "}\n", sep = "") cat(" doi = {", params$doi, "}\n", sep = "") cat("}\n```\n") ``` ::: ```{r fin} sessionInfo() ``` ::: {.callout-note} ## AI Transparency Statement This tutorial was re-developed with the assistance of **Claude** (claude.ai), a large language model created by Anthropic. Claude was used to help revise the tutorial text, structure the instructional content, generate the R code examples, and write the `checkdown` quiz questions and feedback strings. All content was reviewed, edited, and approved by the author (Martin Schweinberger), who takes full responsibility for the accuracy and pedagogical appropriateness of the material. The use of AI assistance is disclosed here in the interest of transparency and in accordance with emerging best practices for AI-assisted academic content creation. ::: [Back to top](#intro) [Back to HOME](/index.html) # References {.unnumbered}

Introduction

Preparation and Session Set-up

The Dataset

Conceptual Foundations

Observed vs. latent variables

The two building blocks of SEM

Path diagrams

A conceptual map of our example

Descriptive Statistics and Correlations

Descriptive statistics

Correlation matrix

Confirmatory Factor Analysis (CFA)

What is Confirmatory Factor Analysis?

Specifying a CFA model in lavaan

Fitting the CFA model

Interpreting factor loadings

Model fit assessment

Internal consistency reliability

Visualising the measurement model

Full Structural Equation Model

Specifying the full SEM

Fitting the full SEM

Structural path estimates

Visualising the full SEM

R² for endogenous variables

Mediation Analysis

What is mediation?

Specifying mediation in lavaan

Bootstrapped confidence intervals for indirect effects

Interpreting mediation results

Model Comparison and Modification

Why compare models?

Comparing a constrained model

Modification indices

Reporting Standards

General principles

Model reporting paragraphs

CFA

Full SEM

Mediation

Quick reference: SEM workflow

Citation & Session Info

References

Specifying a CFA model in `lavaan`

Specifying mediation in `lavaan`